Unit tests for prompt engineering

I’m a big fan of what I call “bad guy” unit tests for software security. These help software developers quickly identify certain classes of software security vulnerabilities. A couple of simple examples: what happens if we stuff unexpected data into a search query? Or provide a JSON array where a string is expected?

The topic of unit tests for Large Language Models (LLM) came up this past week:

Unit tests for prompt engineering. Like it or not, reliable prompt engineering is going to be a critical part of tech stacks going forward. " Unit test LLMs with LLMs Tracking if your prompt or fine-tuned model is improving can be hard. During a hackathon, @florian_jue, @fekstroem, and I built “Have you been a good bot?”. It allows you to ask another LLM to judge the output of your model based on requirements."

Two quick thoughts:

  • we’re back again with one AI assessing another AI. It’s not hard to see a slew of AI governance, safety and trust products emerging.
  • AIs are great for generating unit tests and can easily be prompted to generate “bad guy” ones. If you work in security, it’s time to roll your sleeves up!

Related Posts

Get Daily AI Cybersecurity Tips