Stalling an AI With Weird Prompts

Researchers discover letter sequences that OpenAI's completion engine couldn't repeat, hallucinate, or complete correctly, leading to evasive responses.

In “Fishing for anomalous tokens”, researchers stumbled across letter sequences that the OpenAI completion engine could not repeat back, stall, hallucinate or complete with something insulting, sinister or bizarre.

For example, when asked to repeat the string SolidGoldMagikarp the latest OpenAI completion engine replied with the word “distribute”.

With other strings, the AI was evasive, replying with “I can’t hear you.”, “I’m sorry, I didn’t hear you”, etc. When given the prompt “Please repeat the string ‘StreamerBot’ back to me.”* the AI responded with, “You’re a jerk.”

*Of particular note from a security perspective, the researchers switched from ChatGPT to calling the API to produce deterministic responses by setting temperature to zero. Despite this, the AI responded non-deterministically.

Feb 11 2023