Threat Prompt

Guardrails: Reducing Risky Outputs

How can a developer ensure their API calls from a traditional application to an LLM generate suitably structured, unbiased and safe outputs?

Or put another way: just because you prompt the LLM to generate a JSON output format, you do not always get that.

Better is to provide an example of the output you want to receive. Doing this manually for one or two prompts is fine but not bulletproof.

Enter guardrails…

Guardrails is a Python package that lets a user add structure, type and quality guarantees to the outputs of large language models (LLMs). Guardrails: - does pydantic-style validation of LLM outputs. This includes semantic validation such as checking for bias in generated text, checking for bugs in generated code, etc. - takes corrective actions (e.g. reasking LLM) when validation fails, - enforces structure and type guarantees (e.g. JSON).

This project primarily focuses on wrapping LLM outputs in a structured layer through prompt engineering. Then spotting when outputs don’t parse and resubmitting.

This brings predictability to LLM outputs at the expense of writing an XML file describing your requirements.

April 15, 2023

Guardrails: Reducing Risky Outputs

Related Posts

Get Daily AI Cybersecurity Tips