Prompt Injection Defence by Task-specific Fine-tuning
The LLMs we interact with are designed to follow instructions, which makes them vulnerable to prompt injection. However, what if we abandon their generalized functionality and instead train a non-instructive base model to perform the specific task we require for our LLM integrated application?
A joint research paper led by UC Berkeley...
We present Jatmo, a framework for generating task-specific LLMs that are impervious to prompt-injection attacks. Jatmo bootstraps existing instruction- tuned language models to generate a dataset for a specific task and uses this dataset to fine-tune a different base model. Doing so yields task-specific models that match the performance of standard models, while reducing the success rate of prompt-injection attacks from 87% to approximately 0%. We therefore suggest that Jatmo seems like a practical method for protecting LLM-integrated applications against prompt-injection attacks.
Related Posts
-
Novel Prompt Injection Threats to Application-Integrated Large Language Models
Expanding AI Threat Landscape: Untrusted Data Injection Attacks on Application-Integrated LLMs.
-
AI Security is Probabilistic Security
Emergent Challenges: Prompt Injections and Ensuring AI Security in an Unpredictable Landscape
-
Chat Markup Language (ChatML)
Establishing Conversational Roles and Addressing Syntax-Level Prompt Injections