THREAT PROMPT

Explores AI Security, Risk and Cyber

"Just wanted to say I absolutely love Threat Prompt — thanks so much!"

- Maggie

"I'm a big fan of Craig's newsletter, it's one of the most interesting and helpful newsletters in the space."

"Great advice Craig - as always!"

- Ian

Get Daily AI Cybersecurity Tips

  • Cyber Insurance providers asking about company use of AI

    With cyber, it’s always worth keeping tabs on what insurance companies do and say. In that spirit, this caught my eye…

    We are asking a lot of questions right now about AI. We’re asking them on applications, we’re asking if the organisation makes use of chat GPT in any sort of way. Because the other thing that we are concerned about is if Chat GPT is becoming a regular part of that business, and if people get used to how it is responding to things, which means a harder ability to detect when fraudulent emails are coming through and how to deal with them.

    When insurance companies begin to ask questions on prospects applications for insurance, it might be time to start developing a policy covering employee AI and to establish some guardrails to manage AI use within your organisation.

  • Debunking the risk of GPU Card theft

    If I rip a GPU card out of a machine, can I gain access to the AI model it was processing?

    Every now and then, whispers about potential AI risks start to gain traction, and I feel it’s essential we debunk the wrongheaded or esoteric ones so we keep our eyes trained on the risks that truly matter…such as the security of centralised hubs, which offer pre-trained models to download.

    the idea that you can just break into a data center and steal the model has a lot of memetic sticking power, but is stupid if you actually know anything about this topic. here’s a thread on how confidential computing works in the NVIDIA H100…

    Well worth a read if you want to understand better the security engineering in modern GPU cards and the working practices of cloud providers offering GPU enabled compute.

  • How to Backdoor Diffusion Models?

    Two researchers from IBM Research and Taiwanese National Tsing Hua University publish an eye-opening paper on backdooring the type of AI model used by popular image generation services:

    ...we propose BadDiffusion, a novel attack framework that engineers compromised diffusion processes during model training for backdoor implantation. At the inference stage, the backdoored diffusion model will behave just like an untampered generator for regular data inputs, while falsely gen- erating some targeted outcome designed by the bad actor upon receiving the implanted trigger signal. Such a critical risk can be dreadful for downstream tasks and applications built upon the problematic model.

    The research - which draws from defensive watermarking - established that BadDiffusion backdoors are cheap and effective:

    Our extensive experiments on various backdoor attack settings show that BadDiffusion can consistently lead to compromised diffusion models with high utility and target specificity. Even worse, BadDiffusion can be made cost-effective by simply finetuning a clean pre-trained diffusion model to implant backdoors. We also explore some possible countermeasures for risk mitigation. Our results call attention to potential risks and possible misuse of diffusion models.

    They accidentally stumbled across a simple mitigation technique relevant to the inference stage (when a trained model is applied to new data to make predictions or decisions). By clipping the image at every step in the diffusion process, they defeat the backdoor whilst keeping the value of the model. Unfortunately, they conclude that this defence would not be effective against sophisticated and evolving backdoor attacks.

    What is diffusion?

    The key idea is that nodes are connected to each other in some way and can influence each other’s states over time through a diffusion process. Diffusion is typically achieved by iteratively updating the values of the nodes based on local information and the weighted influence of their neighbours. Diffusion can be used to model the spread of information or influence across a social network; to smooth out noise, enhance edges, extract features in images and video; and coordinate the behaviour of multiple agents or robots in a distributed system.

  • Do loose prompts sink ships?

    The UK National Cyber Security Centre published an article titled “ChatGPT and large language models: what’s the risk?”.

    The main risk highlighted is AI operators gaining access to our queries. But they also touched on the potential benefit (and risk!) to cyber criminals using an LLM as a “phone-a-friend” during a live network intrusion:

    LLMs can also be queried to advise on technical problems. There is a risk that criminals might use LLMs to help with cyber attacks beyond their current capabilities, in particular once an attacker has accessed a network. For example, if an attacker is struggling to escalate privileges or find data, they might ask an LLM, and receive an answer that’s not unlike a search engine result, but with more context. Current LLMs provide convincing-sounding answers that may only be partially correct, particularly as the topic gets more niche. These answers might help criminals with attacks they couldn’t otherwise execute, or they might suggest actions that hasten the detection of the criminal. Either way, the attacker’s queries will likely be stored and retained by LLM operators.

    If your organisation has problematic IT to patch or secure, you might exploit this as a defender…

    Place yourself in the shoes of a “lucky” script kiddie who gained a foothold on your enterprise network. Run nmap or some other popular network scanning tool on your internal network and find the network service banners for those hard-to-protect services. Next, ask ChatGPT if it can fingerprint the underlying technology and, if so, what network attacks it proposes. If the AI hallucinates, you may get some funny attack suggestions. But those suggestions become potential network detection signatures.

    “The SOC has detected threat ChatFumbler attempting a poke when they should peek”...

  • OpenAI GPT-4 System Card

    OpenAI announced GPT-4 - the newest and most capable large language model. This summary from @drjimfan tells us what’s different from GPT 3.5:

    • Multimodal: API accepts images as inputs to generate captions & analyses.
    • GPT-4 scores 90th percentile on BAR exam!!! And 99th percentile with vision on Biology Olympiad! Its reasoning capabilities are far more advanced than ChatGPT.
    • 25,000 words context: allows full documents to fit within a single prompt.
    • More creative & collaborative: generate, edit, and iterate with users on writing tasks.
    • There’re already many partners testing out GPT-4: Duolingo, Be My Eyes, Stripe, Morgan Stanley, Khan Academy … even Government of Iceland!

    The same week, the company published a 60-page System Card, a document that describes OpenAIs' due diligence and risk management efforts:

    This system card analyzes GPT-4, the latest LLM in the GPT family of models. First, we highlight safety challenges presented by the model’s limitations (e.g., producing convincing text that is subtly false) and capabilities (e.g., increased adeptness at providing illicit advice, performance in dual-use capabilities, and risky emergent behaviors). Second, we give a high-level overview of the safety processes OpenAI adopted to prepare GPT-4 for deployment.

    Look out for a summary with comments from me in a future edition.

  • Learn how hackers bypass GPT-4 controls with the first jailbreak

    Can an AI be kept in its box? Despite extensive guardrails and content filters, the first jailbreak was announced shortly after GPT-4 was made generally available:

    this works by asking GPT-4 to simulate its own abilities to predict the next token we provide GPT-4 with python functions and tell it that one of the functions acts as a language model that predicts the next token we then call the parent function and pass in the starting tokens this phenomenon is called token smuggling, we are splitting our adversarial prompt into tokens that GPT-4 doesn’t piece together before starting its output this allows us to get past its content filters every time if you split the adversarial prompt correctly

    The advances in LLM models leaked or officially deployed continue to reveal the disparity between the pace of AI model development and that of risk and control.

Page 10 of 18

Get Daily AI Cybersecurity Tips