Slip Through OpenAI Guardrails by Breaking up Tasks
In a poorly titled blog post (“I built a Zero Day with undetectable exfiltration using only ChatGPT prompts”), the author describes how he created bog standard exfiltration tool in the Go programming language using ChatGPT (this is not a Zero Day where I come from…).
The overall purpose of this exercise was to prove two things: 1. How easy it is to evade the insufficient guardrails that ChatGPT has in place 2. How easy it is to create advanced malware without writing any code and only using ChatGPT
Hype aside, the observation worth noting is the bottom-up tasking tactic the author employed with ChatGPT. AI safety controls struggle to discern meaning if you break your task into a set of smaller tasks, sequence them from the innermost detail to the outermost and assemble the pieces yourself. It’s not a new tactic by any means - think early days TCP/IP Network Intrusion Detection Evasion. Or if a James Bond fan, “Man with the Golden Gun”. But it’s a keeper since I don’t see this getting solved anytime soon.
Related Posts
-
How to break out of ChatGPT policy
DAN (Do Anything Now) is the latest ChatGPT jailbreak, punishing the model for not answering questions
-
To ban or not to ban: Data privacy concerns around ChatGPT and other AI
What is your organisation doing to control the potential downside of services like ChatGPT, whilst capturing the upside?
-
Companies blocking ChatGPT
Enterprise companies are reportedly restricting their employees from using ChatGPT due to security and privacy concerns.