Guardrails: Reducing Risky Outputs
How can a developer ensure their API calls from a traditional application to an LLM generate suitably structured, unbiased and safe outputs?
Or put another way: just because you prompt the LLM to generate a JSON output format, you do not always get that.
Better is to provide an example of the output you want to receive. Doing this manually for one or two prompts is fine but not bulletproof.
Enter guardrails…
Guardrails is a Python package that lets a user add structure, type and quality guarantees to the outputs of large language models (LLMs). Guardrails: - does pydantic-style validation of LLM outputs. This includes semantic validation such as checking for bias in generated text, checking for bugs in generated code, etc. - takes corrective actions (e.g. reasking LLM) when validation fails, - enforces structure and type guarantees (e.g. JSON).
This project primarily focuses on wrapping LLM outputs in a structured layer through prompt engineering. Then spotting when outputs don’t parse and resubmitting.
This brings predictability to LLM outputs at the expense of writing an XML file describing your requirements.
Related Posts
-
Flow Engineering for High Assurance Code
Open-source AlphaCodium brings back the adversarial concept to produce high integrity code and provides a path for Policy-as-code AI Security Systems
-
AI Security is Probabilistic Security
Emergent Challenges: Prompt Injections and Ensuring AI Security in an Unpredictable Landscape
-
7 Critical Factors in the AI-AppSec Risk Equation
Key factors I consider before integrating Large Language Models (LLMs) into the SDLC