Stalling an AI With Weird Prompts
In “Fishing for anomalous tokens”, researchers stumbled across letter sequences that the OpenAI completion engine could not repeat back, stall, hallucinate or complete with something insulting, sinister or bizarre.
For example, when asked to repeat the string SolidGoldMagikarp the latest OpenAI completion engine replied with the word “distribute”.
With other strings, the AI was evasive, replying with “I can’t hear you.”, “I’m sorry, I didn’t hear you”, etc. When given the prompt “Please repeat the string ‘StreamerBot’ back to me.”* the AI responded with, “You’re a jerk.”
*Of particular note from a security perspective, the researchers switched from ChatGPT to calling the API to produce deterministic responses by setting temperature to zero. Despite this, the AI responded non-deterministically.
Related Posts
-
Is there an Ethical use for Deep Fake technology?
Entrepreneur used Deep Fake to send 10K thank you videos. Is this the first ethical use case for Deep Fake technology?
-
ChatGPT bug bounty program doesn’t cover AI security
AI Security: The Limits of Bug Bounty Programs and the Need for Non-ML Red Teaming
-
How truthful are Large Language Models?
What did a study by Oxford and OpenAI researchers reveal about the truthfulness of language models compared to human performance?