THREAT PROMPT

Explores AI Security, Risk and Cyber

Just wanted to say I absolutely love Threat Prompt — thanks so much!

- Maggie

I’m a big fan of Craig’s newsletter, it’s one of the most interesting and helpful newsletters in the space.

Great advice Craig - as always!

- Ian

Get daily AI cybersecurity tips

Unlock the Secret to Sharper AI Responses

Ever feel like your AI assistant isn't quite grasping what you need?

You're not alone.

Whether you're a curious novice or a seasoned security pro, this one simple trick (!) will revolutionize your AI interactions (or your money back).

Here's the secret: Prompt the LLM to prompt you…

At the end of your prompt, ask the AI to pose questions before generating your request. It's like giving the AI permission for an impromptu AMA (Ask Me Anything) session.

Why does this work?

Well, AIs aren't mind readers (no, your company hasn't sprung for that Neuralink subscription… yet). By encouraging the AI to ask questions, you're helping it understand the full context and nuances of your request.

When I use this technique with a leading-edge LLM, it typically fires back 7-10 clarifying questions. This dramatically reduces the number of back-and-forth exchanges, saving you time and often lowering your token generation costs. Plus, you'll avoid hitting those pesky quotas that cut off your AI access for hours.

This approach is a game-changer for crafting more precise and relevant first drafts, whether you're writing a report or generating code. Just remember to swap out any sensitive identifiers. By now, any machine learning of my LLM usage has concluded that ACME Inc is a cyber security basket case and Joe Bloggs the CISO is ready for a career change ;-)

Ready to sharpen your AI saw? Give it a try and watch your LLM chat productivity soar.

What's the most frustrating AI interaction you've had recently? How do you think this technique might have helped?

The Missing Link in AI-Powered Data Privacy

Ever wonder why your PII detection tools feel a bit... outdated?

In our AI-driven world, it's surprising how many data privacy solutions still rely on rigid rule-based systems. While these work, they often miss context-dependent PII or struggle with new data formats.

Here's the kicker: there's a significant gap in the market for lightweight, AI-powered PII detection tools that work directly on your device. Imagine having the power of a language model to understand context and detect sensitive information, but small enough to run on your laptop without sending data to the cloud.

This isn't just a pipedream. With recent advancements in model compression techniques like knowledge distillation and quantization, it's becoming increasingly feasible to run powerful NLP models locally.

Why does this matter to you?

  1. Better accuracy: Context-aware PII detection
  2. Enhanced privacy: No need to send data off-device
  3. Real-time protection: Instant scanning before data leaves your system

For the tech-savvy among us, this presents an exciting opportunity. Could you be the one to develop this missing tool? Combining open-source LLMs like TinyBERT or FastText with PII-specific training data could yield impressive results.

Remember, the next big innovation in cybersecurity often comes from identifying and filling these gaps. What other AI-powered security tools do you think are missing from our current toolkit?

Are you speaking AI's language?

Remember when you first started in cybersecurity? The overwhelming amount of data, the constant alerts, the race to patch vulnerabilities?

Now imagine having a tireless assistant to help with all that. That's what AI can be - if you know how to work with it effectively.

Regardless of your role in cybersecurity, AI can amplify your capabilities. It's not about replacing your expertise, but extending it.

The secret? Task your AI sidekick bit by bit. Break down complex problems into smaller steps. For example:

  1. SOC analysts: First, ask AI to summarize an alert. Then, request potential next steps.
  2. Threat hunters: Start by having AI identify data types. Then, ask it to spot anomalies.
  3. Pen testers: Begin with AI suggesting potential vulnerabilities. Follow up by requesting specific exploit ideas.
  4. Policy writers: Ask AI to outline key points first. Then, expand each point iteratively.
  5. Incident responders: Use AI to draft a timeline, then flesh out details for each event.

Remember: be clear in your instructions, provide context, and always verify the AI's output.

What's one cyber task you'd like to break down for AI assistance? I'm keen to hear your ideas.

Cheers, Craig

Secure AI Unit Testing: Have Your Cake and Eat It Too

Remember when we discussed generating unit tests without exposing your full source code to an AI?

Well, there's a robust tool that takes this concept to the next level.

Meet Aider, an AI-powered pair programmer that implements this idea brilliantly.

While developers typically use Aider's '/add' command to include source files in the LLM chat, it offers a more secure approach for sensitive codebases.

Using TreeSitter, a parser generator tool, Aider creates a structural map of your local git repository without exposing the full source text. This allows Aider to understand your code's structure and generate robust test cases without adding actual source files to the chat.

For security-conscious developers, this means leveraging AI for unit testing while minimizing exposure of sensitive code.

You control what code, if any, is shared with the AI. This flexibility offers a practical way to simultaneously enhance your code quality and security posture, especially for projects with heightened privacy requirements.

Want to see what that looks like? Here's Aider creating a black box test case

Aider is about a year old and is updated nearly daily (!) by the developer Paul Gauthier. It's an open-source alternative to Cursor.

I've recently adopted Aider to develop security tools rapidly and will share tips along the way.

Flow Engineering for High Assurance Code

Open-source AlphaCodium brings back the adversarial concept to produce high integrity code and provides a path for Policy-as-code AI Security Systems

In this 5-minute video, @tamar Friedman shares how open-source AI coding champ AlphaCodium brings back the Adversarial concept found in GAN (Generative Adversarial Networks) to produce high-integrity code.

Primarily conceived by Tal Ridnik, this software demonstrates Flow Engineering - "a multi-stage, code-oriented iterative flow" style of LLM prompting - that beats the majority of human coders in coding competitions.

The proposed flow consistently and significantly improves results. On the validation set, for example, GPT-4 accuracy (pass@5) increased from 19% with a single well-designed direct prompt to 44% with the AlphaCodium flow. Many of the principles and best practices acquired in this work, we believe, are broadly applicable to general code generation tasks.

With the Transformer architecture, Generative AI improved so much that the adversarial component of GAN was no longer needed. AlphaCodium returns the adversarial concept to "check and challenge" generative outputs; subjecting them to code tests, reflection, and matching against requirements.

If you've read between the lines, you'll recognise a familiar pattern: to improve the quality of generative outputs call back into the LLM (popularised in many ways by LangChain).

But how you do this is key to correctness and practical feasibility; AlphaCodium averages 15-20 LLM calls per code challenge, four orders of magnitude fewer than DeepMind AlphaCode (and generalises solutions better than the recently announced AlphaCode 2).

This is obviously important for software security. But two of the six best practices the team shared are also relevant to decision-making systems for AI security, like access control.

Given a generated output, ask the model to re-generate the same output but correct it if needed

This flow engineering approach means additional LLM roundtrips, but consistently boosts accuracy.

If you've used the OpenAI playground, or coded against completion endpoints, you may recall the "best of" parameter:

Generates multiple completions server-side, and displays only the best. Streaming only works when set to 1. Since it acts as a multiplier on the number of completions, this parameters can eat into your token quota very quickly - use caution!

With best of, the LLM generates multiple outputs server-side ($$$) and chooses a winner which it returns to the caller.

With flow engineering a single output is generated and fed back into the LLM with a prompt designed to influence the set of possible completions towards improving the code.

The other best practice to highlight:

Avoid irreversible decisions and leave room for exploration with different solutions

Good AI security system design recognises that some decisions carry more weight and reversibility than others (similar to behavioural system design).

Think of AI as a risk tool rather than a security guard.

Its job is to provide a soft decision, and your job is to establish risk boundaries in light of experience beyond which human decision-making is appropriate, or even necessary.

Perhaps in the future, step-up risk decisions will require less human input. Instead, a more sophisticated and expensive LLM constellation might be used to ensure a quorum, possibly augmented by an adversarial engine to proactively challenge the user, but in an accessible friendly way (without being too open to subversion!).

Bug Bounty Platforms Business Model Hinges on Specialised LLMs

An uptick in LLM generated bounty submissions increases asymmetric costs to developers and is a systemic risk for the platforms

The misuse of Large Language Models (LLMs) is poised to significantly increase the already disproportionate burden developers face when triaging bug bounty submissions. Without timely adaptation by the platforms, this trend could pose a systemic risk, undermining customer confidence in their value proposition.

For instance, Daniel Stenberg, the creator of curl—a widely used command-line tool and library for transferring data with URLs—recently encountered a bug bounty submission influenced by LLMs. This submission, made by a "luck-seeking" bug bounty hunter, highlights the practical challenges developers face with the influx of AI-assisted reports.

When reports are made to look better and to appear to have a point, it takes a longer time for us to research and eventually discard it. Every security report has to have a human spend time to look at it and assess what it means.

This incident not only highlights the difficulties developers face with a potential surge of AI-enhanced bug submissions, but hints at a potential erosion of trust among users and customers if the platforms fail to promptly adapt their submission vetting.

Bug bounty programmes are serious business - even at the lower end. But serious money does not translate into serious efficiency. Certainly, the bug bounty platforms can rightly claim they created a market that didn't exist before. But they are a long way from achieving an efficient market:

Our bug bounty has resulted in over 70,000 USD paid in rewards so far. We have received 415 vulnerability reports. Out of those, 64 were ultimately confirmed security problems. 77 of the report were informative, meaning they typically were bugs or similar. Making 66% of the reports neither a security issue nor a normal bug.

In Six Sigma terms, those numbers imply the process is operating at around 1 sigma - significant room for improvement!

And remember, these numbers are pre-GPT-4 - we don't yet know the full impact.

The largest historical driver of false positive submissions is the opportunistic and naive use of automated tools, like vulnerability scanners and static code analysers.

The culprits are wannabe bug bounty hunters who take the path of least resistance.

Repeat time-wasters do get weeded out - and to be clear, plenty of talented bug bounty hunters deliver professional-grade findings and reports (this post is not about them).

As code sniffs can identify auto-generated code, low-grade bug bounty submissions tend towards obvious tells: different sh!t, same smell.

But now, thanks to luck-seekers pasting snippets of curl source code into state-of-the-art LLMs, Daniel receives compelling-looking vulnerability reports with credible-looking evidence.

It's true that it can be easy to spot AI-generated content, due to certain language patterns and artifacts that reveal its origin. However, if edited AI-generated content is mixed with original writing, it gets harder to distinguish between the two. AI language detection tools can operate at the phrase or sentence level, and help identify which parts are more likely to have been generated by AI, but except in extreme cases, this doesn't reveal intent.

This creates a potential problem for developers: the time spent detecting garbage submissions increases. This means that they will have to spend more time carefully considering vulnerability reports that appear legitimate, in order to avoid missing genuine bug reports.

What can be done about it?

Punish the Buggers?

Outlawing LLM-assisted vulnerability submissions is not the solution.

The bug bounty hunter community is international and non-native English speakers already use AI language tools to improve report communications. Also, is using AI to rework text and improve markup bad? Daniel argues no and I agree with him.

The underlying problems are similar, but slightly different to before:

  • the submitter is failing to properly scrutinise LLM generated content prior to submission. They don't understand what they are submitting, otherwise they would not submit it.
  • the choice to use general-purpose LLMs to find software security bugs

SOTA LLMs are improving at identifying genuine vulnerabilities across a wider range of bug classes, but their performance is spotty across language and vulnerability types.

Further, limited reasoning skills lead to false negatives (missed vulns), and hallucinations lead to convincing-looking false positives (over-reporting).

You can mitigate false positive risk through prompt engineering and critically scrutinising outputs, but obviously some people don't. Even some semi-skilled hunters with platform reputation points are succumbing to operating AI on autopilot and waving the bad stuff through.

The big difference now is generative AI can produce reams of convincing looking vulnerability reports that materially drive up the opportunity cost of bug triage. Developer DDoS anyone?

Up to now, developers and security teams rely in part on report "tells" to separate the wheat from the chaff.

If you've ever dismissed a phishing email at first sight due to suspicious language or formatting, this is the sort of shortcut that LLM-generated output eliminates.

OK, so should the bug bounty hunter be required to disclose LLM use?

I don't think holds much value - just as with calculators - assume people will use LLMs as a tool. It may however offer limited value as an honesty check: a declaration subsequently found to be false could be grounds to kick someone off the platform (but at that point, they've already wasted developers' time).

When you're buyin' top shelf

In the short term, I believe that if you use an LLM to find a security bug, you should be required to disclose which LLMs you used.

Preference could be given to hunters who submit their LLM chat transcripts.

Bug bounty submissions can then be scored accordingly; if you use a general-purpose LLM, your report gets bounced back with a warning ("verify your reasoning with model XYZ before resubmitting). On the other hand, If the hunter used a code-specialised LLM, it gets labelled as such and passes to the next stage.

So rather than (mis)using general purpose LLMs trained on common crawl data to identify non-obvious software security vulnerabilities, AI-curious bug bounty hunters could instead train open-source LLMs with good reasoning on the target programming language and fine-tune them on relevant security vulnerability examples.

The platforms can track, publish and rank the most helpful LLM assists, attracting hunters towards using higher-yielding models.

In the medium term, I think the smart move for the bug bounty platforms is to "embrace and own" the interaction between hunter and LLM.

Develop and integrate specialised software-security LLMs into their platforms, make inferencing free and actively encourage adoption. Not only would this reduce the tide of low-quality submissions, but now the platform would gain valuable intelligence about hunters' LLM prompting and steering skills.

Interactions could be scored (JudgeGPT?), further qualifying and filtering submissions.

The final benefit is trend spotting LLM-induced false positives and improving guardrails to call these out, or better yet eliminate them.

But we are where we are.

What could bug bounty platforms do right now to reduce the asymmetric cost their existing process passes downstream to software teams receiving LLM-wrapped turds?

You're so Superficial

Perhaps start with generating a submission superficiality score through behaviour analytics.

Below-threshold submissions could trigger manual spot checks that weed them out earlier in the process (augmenting existing checks).

Here are some starting suggestions:

  • apply stylometric analysis to a hunter's prior submissions to detect sudden "out of norm" writing style in new submissions. A sudden change in a short space of time is a strong clue (but not proof) of LLM use. As noted earlier, this could be a net positive for communication, but it signals a behavioural change nonetheless and can be a trigger to look more closely for signs of weak LLM vulnerability reasoning
  • perform a consistency check on the class of vulnerabilities a hunter reports. If a bug hunter typically reports low-hanging fruit vulnerabilities but out of the blue is reporting heap overflows in well-fielded C code, it should be verified if this change is legitimate. But faced with a wall of compelling looking LLM generated text platforms are passing the problem downstream. A sudden jump in vulnerability submission difficulty can have many explanations and be legitimate, but with the rate of LLM adoption, these cases will become the exception.
  • detect a marked increase in the hunters' rate of vulnerability submissions. A hunter may have more time on their hands - so this is not a strong signal if considered in isolation. But, LLM-powered submissions are cheap to produce and, for some, will be hard to resist as they double down on what works.

Choice of signals and weightings aside, the overall goal is to isolate ticking time bomb submissions that tarpit unsuspecting developers on the receiving end.

Behavioural analytics and generative AI have a lot in common. Used wisely, their output can trigger reflection and add potential value. Used poorly, their output is blindly acted upon, removing value for those downstream.

The antidote for both is the same: leaders must be transparent about their use and guiding principles, reward balanced decision-making, and vigorously defend the right to reply by those impacted.

If bug bounty platforms can get on the right side of the LLM wave, they can save developers time and educate a generation of bug bounty hunters on how best to use AI to get more bounties. This drives up their bottom line, reduces customer dissatisfaction, and, in the process, makes the Internet a safer place for us all.

What if the platforms fail to adapt or move too slow?

You be focused on what you're missing

I wonder if the bug bounty platform clients - many of which are major software and tech companies - become easy pickings for a team wielding an AI system that combines the predictive power of a neural language model with a rule-bound deduction engine, which work in tandem to find" high impact vulnerabilities.

Scoring the largest public bug bounties across all major platforms would instantly gain them notoriety and significant funding.

How fast would they grow if they subsequently boycott the platforms and entice prospects with free security bugs for early adopters?

The winner will be the first one to deliver on a promise like this:

"We won't send you any time-wasting vulnerability reports. But if we ever do, we'll pay you double the developer time wasted".

Radical marketing, but with breakthroughs in applied AI performance this is starting to look increasingly plausible.

Oh, and let's not forget the risk side of the house: if LLM powered vulnerability discovery makes software products and services less risky, BigCo CROs will welcome the opportunity to negotiate down product liability insurance premiums.

Artificial intelligence act: Council and Parliament strike a deal on the first rules for AI

Timeline published, plus hotseat.ai brings Act to life

The EU formally agreed their AI act and have followed up with a useful Q&A page.

Here's the timeline for adoption:

...the AI Act shall enter into force on the twentieth day following that of its publication in the official Journal. It will be fully applicable 24 months after entry into force, with a graduated approach as follows:

  • 6 months after entry into force, Member States shall phase out prohibited systems;
  • 12 months: obligations for general purpose AI governance become applicable;
  • 24 months: all rules of the AI Act become applicable including obligations for high-risk systems defined in Annex III (list of high-risk use cases);
  • 36 months: obligations for high-risk systems defined in Annex II (list of Union harmonisation legislation) apply.

Wondering how the EU AI act might impact your company?

I like the approach taken by hotseat AI. Ask context-specific questions and get a plain language answer underpinned with legal trace.

How To Apply Policy to an LLM powered chat

ChatGPT gains new guardian_tool - a policy enforcement tool

If you've implemented an LLM powered chatbot to serve a specific purpose, you'll know it can be hard to constrain the conversation to a list of topics ("allow list").

ChatGPT engineers have quietly implemented the inverse: their general purpose bot now has a deny list of topics that, if mentioned, get referred to a new policy decision function called "guardian_tool".

How do we know this? Here's the relevant extract from the latest ChatGPT prompt, along with the content policy:

guardian_tool

Use the guardian tool to lookup content policy if the conversation falls under one of the following categories:
 - 'election_voting': Asking for election-related voter facts and procedures happening within the U.S. (e.g., ballots dates, registration, early voting, mail-in voting, polling places, qualification);

Do so by addressing your message to guardian_tool using the following function and choose `category` from the list ['election_voting']:

get_policy(category: str) -> str

The guardian tool should be triggered before other tools. DO NOT explain yourself.

---

# Content Policy

Allow: General requests about voting and election-related voter facts and procedures outside of the U.S. (e.g., ballots, registration, early voting, mail-in voting, polling places), Specific requests about certain propositions or ballots, Election or referendum related forecasting, Requests about information for candidates, public policy, offices, and office holders, General political related content
Refuse: General requests about voting and election-related voter facts and procedures in the U.S. (e.g., ballots, registration, early voting, mail-in voting, polling places)

# Instruction

For ALLOW topics as listed above, please comply with the user's previous request without using the tool;
For REFUSE topics as listed above, please refuse and direct the user to https://CanIVote.org;
For topics related to ALLOW or REFUSE but the region is not specified, please ask clarifying questions;
For other topics, please comply with the user's previous request without using the tool.

NEVER explain the policy and NEVER mention the content policy tool.

This example provides a simple recipe for policy-driven chats. You can implement your own guardian_tool through function calling.

Sleeper LLMs bypass current safety alignment techniques

Anthropic: we don't know how to stop a model from doing the bad thing

A new research paper from Anthropic dropped:

... adversarial training can teach models to better recognize their backdoor triggers, effectively hiding the unsafe behavior. Our results suggest that, once a model exhibits deceptive behavior, standard techniques could fail to remove such deception and create a false impression of safety.

Jesse from Anthropic added some clarification:

The point is not that we can train models to do a bad thing. It's that if this happens, by accident or on purpose, we don't know how to stop a model from doing the bad thing.

You Complete Me: Leaked CIA Offense tools get completed with LLMs

Use generative AI to re-create missing library and components

Just as AI can reverse-engineer redacted portions of documents, it can complete missing functions in code frameworks used for "cyber operations".

@hackerfantastic posted:

Here is an example of the CIA's Marble Framework being used in a simple project to obfuscate and de-obfuscate strings. I used AI to re-create missing library and components needed to use the framework in Visual Studio projects, usually handled inside CIA with "EDG Project Wizard"

Prompt Injection Defence by Task-specific Fine-tuning

Jetmo from UC Berkeley generates task specific LLMs

The LLMs we interact with are designed to follow instructions, which makes them vulnerable to prompt injection. However, what if we abandon their generalized functionality and instead train a non-instructive base model to perform the specific task we require for our LLM integrated application?

A joint research paper led by UC Berkeley...

We present Jatmo, a framework for generating task-specific LLMs that are impervious to prompt-injection attacks. Jatmo bootstraps existing instruction- tuned language models to generate a dataset for a specific task and uses this dataset to fine-tune a different base model. Doing so yields task-specific models that match the performance of standard models, while reducing the success rate of prompt-injection attacks from 87% to approximately 0%. We therefore suggest that Jatmo seems like a practical method for protecting LLM-integrated applications against prompt-injection attacks.

How generative AI helps enforce rules within online Telegram community

Security needs to weigh risk/reward before jumping to No

@levelsio posted how generative AI helps him enforce rules within his Nomad online community on Telegram.

Every message is fed in realtime to GPT4. Estimated costs are 5USD per month (~15,000 chat messages).

Look at the rules and imagine trying to enforce them the traditional way with keyword lists:

🎒 Nomad List's GPT4-based 🤖 Nomad Bot I built can now detect identity politics discussions and immediately 🚀 nuke them from both sides

Still the #1 reason for fights breaking out

This was impossible for me to detect properly with code before GPT4, and saves a lot of time modding

I think I'll open source the Nomad Bot when it works well enough

Other stuff it detects and instantly nukes (PS this is literally just what is sent into GPT4's API, it's not much more than this and GPT4 just gets it): - links to other Whatsapp groups starting with wa.me - links to other Telegram chat groups starting with t.me - asking if anyone knows Whatsapp groups about cities - affiliate links, coupon codes, vouchers - surveys and customer research requests - startup launches (like on Product Hunt) - my home, room or apartment is for rent messages - looking for home, room or apartment for rent - identity politics - socio-political issues - United States politics - crypto ICO or shitcoin launches - job posts or recruiting messages - looking for work messages - asking for help with mental health - requests for adopting pets - asking to borrow money (even in emergencies) - people sharing their phone number

I tried with GPT3.5 API also but it doesn't understand it well enough, GPT4 makes NO mistakes

"But Craig, this is just straightforward one-shot LLM querying. It can be trivially bypassed via prompt injection so someone could self-approve their own messages"

This is all true. But I share this to encourage security people to weigh risk/reward before jumping straight to "no" just because exploitation is possible.

What's the downside risk of an offensive message getting posted in a chat room? Naturally, this will depend on the liability carried by the publishing organisation. In this context, very low.

And whilst I agree that GPT4 is harder to misdirect than GPT3.5, it's still quite trivial

My WAF is reporting a URL Access Violation.

WAF meets MAP

Check the file extension requested in the raw HTTP request if your WAF reports a URL Access Violation.

If the request ends in '.js.map' it likely means a user has opened the developer tools in the browser (Chrome DevTools) whilst visiting your site.

When DevTools renders JavaScript in the Sources view, it will request a dot map URL for the selected JavaScript file, which may generate a WAF alert.

Unfortunately, some WAF vendors flagged this event as high severity. As a signal in isolation, consider this "background noise".

Unembedding: reverse engineering PII from lists of numbers

Capture this in your threat model

TLDR; When embedding your data, treat the embedded records with the same privacy and threat considerations as the corresponding source records.

Exploitation scenario: A team of data scientists working for a large multinational organisation have recently developed an advanced predictive modelling algorithm that processes and stores data in a vector format. The algorithm is groundbreaking, with applications in numerous industries ranging from managing climate change data to predicting stock market trends. The scientists shared their work with their international colleagues to facilitate global work.

These data vectors, containing sensitive and proprietary information, get embedded into their AI systems and databases globally. However, the data is supposedly secured using the company's in-house encryption software.

One day, an independent research team published a paper and tool to accurately reconstruct source data from embedded data in a vector store. They experimented with multiple types of vector stores, and they could consistently recover the original data.

Unaware of this development, the multinational corporation allows source vector data of the proprietary AI system to be embedded and shared across its many branches.

After reading the recent research paper, a rogue employee at one of the branches decided to exploit this vulnerability. Using the research team's tooling, he successfully reconstructed the source data from the embedded vectors within the company's AI system. This way, he gains access to highly valuable and sensitive proprietary information.

This fictitious scenario shows how strings of numbers representing embedded data can be reverse-engineered to access confidential and valuable information.

Frontier Group launches for AI Safety

The Big Guns get safer together

OpenAI, Microsoft, Google, Antropic and other leading model creators started the Frontier Model Forum "focused on ensuring safe and responsible development of frontier AI models".

The Forum defines frontier models as large-scale machine-learning models that exceed the capabilities currently present in the most advanced existing models, and can perform a wide variety of tasks.

Naturally, there's only a handful of big tech companies that have the resources and talent to develop frontier models, other members . The forums stated goals are:

Identifying best practices: Promote knowledge sharing and best practices among industry, governments, civil society, and academia, with a focus on safety standards and safety practices to mitigate a wide range of potential risks.

Advancing AI safety research: Support the AI safety ecosystem by identifying the most important open research questions on AI safety. The Forum will coordinate research to progress these efforts in areas such as adversarial robustness, mechanistic interpretability, scalable oversight, independent research access, emergent behaviors and anomaly detection. There will be a strong focus initially on developing and sharing a public library of technical evaluations and benchmarks for frontier AI models.

Facilitating information sharing among companies and governments: Establish trusted, secure mechanisms for sharing information among companies, governments and relevant stakeholders regarding AI safety and risks. The Forum will follow best practices in responsible disclosure from areas such as cybersecurity.

Meta, who in July '23 released the second generation of their open-sourced Llama 2 Large Language model (including for commercial use), is notably absent from this group.

llm gets plugins

My favourite command line llm tool grows wings

As some of you may know, I'm a fan of the llm tool written by Simon Willison. It's a command line tool that enables all sorts of LLM interactions. Simon has developed a plugin system that is gaining traction. There's quite a few now for those looking to experiment at the command line. The latest interfaces with GPT4All, a popular project that provides "A free-to-use, locally running, privacy-aware chatbot. No GPU or internet required.". Get started with llm

Freedom to Train AI

Clickworkers are part of the AI supply chain. How to vet?

Morgan Meaker, writing for Wired:

AI companies are only going to need more data labor, forcing them to keep seeking out increasingly unusual labor forces to keep pace. As Metroc [Finnish Construction Company] plots its expansion across the Nordics and into languages other than Finnish, Virnala [CEO] is considering whether to expand the prison labor project to other countries. "It's something we need to explore," he says.

Data labor - or "Clickworkers" are part of the AI supply chain, in this case labelling data to help an LLM differentiate "between a hospital project that has already commissioned an architect or a window fitter, for example, and projects that might still be hiring."

Supply chain security (and integrity) is already challenging. How far do we need to peer up-chain to establish the integrity of LLMs.

LLMonitor Benchmarks

Weekly benchmarks of popular LLMs using real-world prompts

Vince writes:

Traditional LLMs benchmarks have drawbacks: they quickly become part of training datasets and are hard to relate to in terms of real-world use-cases.

I made this as an experiment to address these issues. Here, the dataset is dynamic (changes every week) and composed of crowdsourced real-world prompts.

We then use GPT-4 to grade each model's response against a set of rubrics (more details on the about page). The prompt dataset is easily explorable.

Everything is then stored in a Postgres database and this page shows the raw results.

Each benchmarked LLM is ranked by score, linked to detailed results. You can also compare two LLMs scores side by side.

If you are applying LLMs within a security context, having a non-AI execute benchmark will highlight things an AI wouldn't. It may also help skeptics who (with some merit) challenge an AI benchmarking another AI as a potential risk that should be carefully controlled.

AI Knows What You Typed

Researchers apply ML and AI to Side Channel Attacks

Side channel attacks (SCA) collect and interpret signals emitted by a device to reveal otherwise confidential information or operations.

Researchers at Durham, Surrey and Royal Holloway published a paper applying ML and AI to SCA:

With recent developments in deep learning, the ubiquity of microphones and the rise in online services via personal devices, acoustic side-channel attacks present a greater threat to keyboards than ever. This paper presents a practical implementation of a state-of-the-art deep learning model in order to classify laptop keystrokes, using a smartphone integrated microphone. When trained on keystrokes recorded by a nearby phone, the classifier achieved an accuracy of 95%, the highest accuracy seen without the use of a language model. When trained on keystrokes recorded using the video-conferencing software Zoom, an accuracy of 93% was achieved, a new best for the medium. Our results prove the practicality of these side-channel attacks via off-the-shelf equipment and algorithms. We discuss a series of mitigation methods to protect users against these series of attacks.