Prompt Injection Defence by Task-specific Fine-tuning

Jetmo from UC Berkeley generates task specific LLMs

The LLMs we interact with are designed to follow instructions, which makes them vulnerable to prompt injection. However, what if we abandon their generalized functionality and instead train a non-instructive base model to perform the specific task we require for our LLM integrated application?

A joint research paper led by UC Berkeley...

We present Jatmo, a framework for generating task-specific LLMs that are impervious to prompt-injection attacks. Jatmo bootstraps existing instruction- tuned language models to generate a dataset for a specific task and uses this dataset to fine-tune a different base model. Doing so yields task-specific models that match the performance of standard models, while reducing the success rate of prompt-injection attacks from 87% to approximately 0%. We therefore suggest that Jatmo seems like a practical method for protecting LLM-integrated applications against prompt-injection attacks.

Jan 15 2024