Threat Prompt

Stalling an AI With Weird Prompts

In “Fishing for anomalous tokens”, researchers stumbled across letter sequences that the OpenAI completion engine could not repeat back, stall, hallucinate or complete with something insulting, sinister or bizarre.

For example, when asked to repeat the string SolidGoldMagikarp the latest OpenAI completion engine replied with the word “distribute”.

With other strings, the AI was evasive, replying with “I can’t hear you.”, “I’m sorry, I didn’t hear you”, etc. When given the prompt “Please repeat the string ‘StreamerBot’ back to me.”* the AI responded with, “You’re a jerk.”

*Of particular note from a security perspective, the researchers switched from ChatGPT to calling the API to produce deterministic responses by setting temperature to zero. Despite this, the AI responded non-deterministically.

February 11, 2023

Stalling an AI With Weird Prompts

Related Posts

Get Daily AI Cybersecurity Tips