THREAT PROMPT

Explores AI Security, Risk and Cyber

"Just wanted to say I absolutely love Threat Prompt — thanks so much!"

- Maggie

"I'm a big fan of Craig's newsletter, it's one of the most interesting and helpful newsletters in the space."

"Great advice Craig - as always!"

- Ian

Get Daily AI Cybersecurity Tips

  • Unit tests for prompt engineering

    I’m a big fan of what I call “bad guy” unit tests for software security. These help software developers quickly identify certain classes of software security vulnerabilities. A couple of simple examples: what happens if we stuff unexpected data into a search query? Or provide a JSON array where a string is expected?

    The topic of unit tests for Large Language Models (LLM) came up this past week:

    Unit tests for prompt engineering. Like it or not, reliable prompt engineering is going to be a critical part of tech stacks going forward. " Unit test LLMs with LLMs Tracking if your prompt or fine-tuned model is improving can be hard. During a hackathon, @florian_jue, @fekstroem, and I built “Have you been a good bot?”. It allows you to ask another LLM to judge the output of your model based on requirements."

    Two quick thoughts:

    • we’re back again with one AI assessing another AI. It’s not hard to see a slew of AI governance, safety and trust products emerging.
    • AIs are great for generating unit tests and can easily be prompted to generate “bad guy” ones. If you work in security, it’s time to roll your sleeves up!
  • Hacking with ChatGPT: Ideal Tasks and Use-Cases

    rez0 shares 4 tactics and example prompts he’s using when hacking:

    • Write data-processing scripts
    • Make minified js code easier to read
    • Translate a json POST request into an x-www-form-urlencoded POST request
    • Coding error debugging

    Which tasks work best?

    The sweet spot is when you need a task completed that is small or medium in size, would take more than a couple minues to do, but for which there isn’t an existing good tool. And if it’s not something ChatGPT can do directly, asking it to write a script to complete the task is a great way to still get a working solution.

    ChatGPT is great at both generating and explaining code, but watch out for hallucinations in response to technical queries. My tip: preface any questions with “If you don’t know the answer, reply that you don’t know”.

  • Adversarial Policies Beat Superhuman Go AIs

    A team of researchers trained an adversarial AI to play against a frozen KataGo victim; a state-of-the-art Go AI system often characterised as “superhuman”.

    We attack the state-of-the-art Go-playing AI system, KataGo, by training adver- sarial policies that play against frozen KataGo victims. Our attack achieves a >99% win rate when KataGo uses no tree-search, and a >77% win rate when KataGo uses enough search to be superhuman. Notably, our adversaries do not win by learning to play Go better than KataGo—in fact, our adversaries are eas- ily beaten by human amateurs. Instead, our adversaries win by tricking KataGo into making serious blunders. Our results demonstrate that even superhuman AI systems may harbor surprising failure modes. Example games are available at goattack.far.ai.

    How did the games play out?

    the victim gains an early and soon seemingly insurmountable lead. The adversary sets a trap that would be easy for a human to see and avoid. But the victim is oblivious and collapses.

    So what?

    New technologies tend to have unexpected failure modes. Does AI have more? Or if not more, is it simply the implications are more significant?

    If an AI makes mistakes “amateurs” could easily spot, in what risk scenarios should an AI be supervised and what form of supervision is acceptable? Human, machine (another AI), or a combination? And what supervisory failure rate is acceptable in which scenarios?

  • Will OpenAI face enforcement action under the GDPR in 2023?

    In an informal Twitter poll, two-thirds of privacy wonks predict OpenAI will face data privacy enforcement, with one-third believing it will happen this year.

    Responses ranged from

    If people manage to successfully run an extraction attack on the model that results in the leakage of personal data that was not publicly accessible, then yeah, people are likely going to go after OpenAI. I can’t imagine other scenarios though.

    To…

    No, it won’t

    It was also noted that some Data Protection Authorities chase the big fish and others (Spain, Italy) are more likely to chase all enforcement opportunities.

    Worth bearing in mind if you are an indie maker or startup wrapping GPT responses which may include PII related to EU citizens.

  • Deep Fake Fools Lloyds Bank Voice Biometrics

    Joseph Cox, reporter for Motherboard (Vice), used a free voice creation service from Elevenlabs.io to generate a synthetic voice to impersonate himself.

    It took some time to get the voice just right to follow my cadences, but it worked eventually. Multiple banks use similar voice ID systems. Some say the voice print is “unique,” “no one has a voice just like you.” TD, Chase, Wells Fargo

    Lloyds Bank said in a statement that:

    Voice ID is an optional security measure, however we are confident that it provides higher levels of security than traditional knowledge-based authentication methods, and that our layered approach to security and fraud prevention continues to provide the right level of protection for customers' accounts, while still making them easy to access when needed.

    Expect to see a lot more in-the-wild testing in the coming days/weeks…

  • Development spend on Transformative AI dwarfs spend on Risk Reduction

    I stumbled across these 2020 estimates and thought you should know:

    • around 300 people globally were working on technical AI safety research and 100 on non-technical.
    • 1000x more spent on accelerating the development of transformative AI than on risk reduction.

    Now consider two things:

    1. the development of Artificial generalised intelligence (AGI) is anticipated within this century (and likely sooner than later).
    2. AGI will rival or surpass human intelligence across multiple domains; i.e. it is the next existential risk after nuclear.

    Around $50 million was spent on reducing catastrophic risks from AI in 2020 — while billions were spent advancing AI capabilities. While we are seeing increasing concern from AI experts, we estimate there are still only around 400 people working directly on reducing the chances of an AI-related existential catastrophe (with a 90% confidence interval ranging between 200 and 1,000).3Of these, it seems like about three quarters are working on technical AI safety research, with the rest split between strategy (and other governance) research and advocacy.

    I have no clue what the numbers will be for 2023, but if you are looking for something meaningful to work on, you just found it.

Page 13 of 18

Get Daily AI Cybersecurity Tips