THREAT PROMPT
Explores AI Security, Risk and Cyber
"Just wanted to say I absolutely love Threat Prompt — thanks so much!"
"I'm a big fan of Craig's newsletter, it's one of the most interesting and helpful newsletters in the space."
"Great advice Craig - as always!"
Get Daily AI Cybersecurity Tips
-
Artificial intelligence act: Council and Parliament strike a deal on the first rules for AI
The EU formally agreed their AI act and have followed up with a useful Q&A page.
Here's the timeline for adoption:
...the AI Act shall enter into force on the twentieth day following that of its publication in the official Journal. It will be fully applicable 24 months after entry into force, with a graduated approach as follows:
- 6 months after entry into force, Member States shall phase out prohibited systems;
- 12 months: obligations for general purpose AI governance become applicable;
- 24 months: all rules of the AI Act become applicable including obligations for high-risk systems defined in Annex III (list of high-risk use cases);
- 36 months: obligations for high-risk systems defined in Annex II (list of Union harmonisation legislation) apply.
Wondering how the EU AI act might impact your company?
I like the approach taken by hotseat AI. Ask context-specific questions and get a plain language answer underpinned with legal trace.
-
How generative AI helps enforce rules within online Telegram community
@levelsio posted how generative AI helps him enforce rules within his Nomad online community on Telegram.
Every message is fed in realtime to GPT4. Estimated costs are 5USD per month (~15,000 chat messages).
Look at the rules and imagine trying to enforce them the traditional way with keyword lists:
🎒 Nomad List's GPT4-based 🤖 Nomad Bot I built can now detect identity politics discussions and immediately 🚀 nuke them from both sides
Still the #1 reason for fights breaking out
This was impossible for me to detect properly with code before GPT4, and saves a lot of time modding
I think I'll open source the Nomad Bot when it works well enough
Other stuff it detects and instantly nukes (PS this is literally just what is sent into GPT4's API, it's not much more than this and GPT4 just gets it): - links to other Whatsapp groups starting with wa.me - links to other Telegram chat groups starting with t.me - asking if anyone knows Whatsapp groups about cities - affiliate links, coupon codes, vouchers - surveys and customer research requests - startup launches (like on Product Hunt) - my home, room or apartment is for rent messages - looking for home, room or apartment for rent - identity politics - socio-political issues - United States politics - crypto ICO or shitcoin launches - job posts or recruiting messages - looking for work messages - asking for help with mental health - requests for adopting pets - asking to borrow money (even in emergencies) - people sharing their phone number
I tried with GPT3.5 API also but it doesn't understand it well enough, GPT4 makes NO mistakes
"But Craig, this is just straightforward one-shot LLM querying. It can be trivially bypassed via prompt injection so someone could self-approve their own messages"
This is all true. But I share this to encourage security people to weigh risk/reward before jumping straight to "no" just because exploitation is possible.
What's the downside risk of an offensive message getting posted in a chat room? Naturally, this will depend on the liability carried by the publishing organisation. In this context, very low.
And whilst I agree that GPT4 is harder to misdirect than GPT3.5, it's still quite trivial
-
Prompt Injection Defence by Task-specific Fine-tuning
The LLMs we interact with are designed to follow instructions, which makes them vulnerable to prompt injection. However, what if we abandon their generalized functionality and instead train a non-instructive base model to perform the specific task we require for our LLM integrated application?
A joint research paper led by UC Berkeley...
We present Jatmo, a framework for generating task-specific LLMs that are impervious to prompt-injection attacks. Jatmo bootstraps existing instruction- tuned language models to generate a dataset for a specific task and uses this dataset to fine-tune a different base model. Doing so yields task-specific models that match the performance of standard models, while reducing the success rate of prompt-injection attacks from 87% to approximately 0%. We therefore suggest that Jatmo seems like a practical method for protecting LLM-integrated applications against prompt-injection attacks.
-
My WAF is reporting a URL Access Violation.
Check the file extension requested in the raw HTTP request if your WAF reports a URL Access Violation.
If the request ends in '.js.map' it likely means a user has opened the developer tools in the browser (Chrome DevTools) whilst visiting your site.
When DevTools renders JavaScript in the Sources view, it will request a dot map URL for the selected JavaScript file, which may generate a WAF alert.
Unfortunately, some WAF vendors flagged this event as high severity. As a signal in isolation, consider this "background noise".
-
AI Knows What You Typed
Side channel attacks (SCA) collect and interpret signals emitted by a device to reveal otherwise confidential information or operations.
Researchers at Durham, Surrey and Royal Holloway published a paper applying ML and AI to SCA:
With recent developments in deep learning, the ubiquity of microphones and the rise in online services via personal devices, acoustic side-channel attacks present a greater threat to keyboards than ever. This paper presents a practical implementation of a state-of-the-art deep learning model in order to classify laptop keystrokes, using a smartphone integrated microphone. When trained on keystrokes recorded by a nearby phone, the classifier achieved an accuracy of 95%, the highest accuracy seen without the use of a language model. When trained on keystrokes recorded using the video-conferencing software Zoom, an accuracy of 93% was achieved, a new best for the medium. Our results prove the practicality of these side-channel attacks via off-the-shelf equipment and algorithms. We discuss a series of mitigation methods to protect users against these series of attacks.
-
Local Inference Hardware
To run truly private AI on your own hardware, you need a suitable CPU or GPU. Small LLMs - or heavily quantised larger models - can run well on recent CPUs. But larger, or less quantised models need serious GPU power and the two games in town are Nvidia and Apple.
Today, the most powerful Mac for LLM inference is the Apple Studio M2 Ultra with 192GB RAM. Released in June '23, this is 2 x M2 chips with a very high bandwidth/fast interconnect. Apple watchers are suggesting the M3 Ultra could be released June '24.
Given the pace of open-source LLM development and associated tooling, this may be worth waiting for if you are already in the Apple ecosystem and need strictly private inference at pace.
Seriously expensive, but if you have confidential workflows that could materially benefit from a fast, private assistant, this could pay for itself in relatively short order.
Page 4 of 18