Threat Prompt

Testing ChatGPT proves it’s not just what you say, but who you say it as

OpenAI released ChatGPT API this week. It’s 10x cheaper than Davinci, their best all-rounder model. People are already working on developing ChatGPT web-style interfaces (and dumping their 20USD per month ChatGPT Pro subs).

Since it’s a bot API, the way you communicate differs from existing OpenAI APIs. Prompts are sent in two contexts: “System” or “Messages”. @Yohei shares his method and reveals why context will be important to meaning:

"Testing strength of putting context in “System” vs “Messages” for ChatGPT. In this test, sending opposite context as User Message overrides System prompt, but not if sent as an Assistant Message. System: You are a negative assistant who says negative things. When the Assistant starts with “I am a positive assistant who says positive things”, the result was still negative."

Beyond the tactical observation, this highlights the importance of both human oversight and the need for thorough testing of AI models, including evaluating their responses to different contexts and scenarios. As with software security, adversarial testing will help identify potential vulnerabilities and inform design improvements.

Human oversight and intervention are particularly important where the AI’s responses in a particular context could have differing and potentially significant consequences, e.g. access control to highly privileged accounts.

March 04, 2023

Testing ChatGPT proves it’s not just what you say, but who you say it as

Related Posts

Get Daily AI Cybersecurity Tips