THREAT PROMPT

Explores AI Security, Risk and Cyber

"Just wanted to say I absolutely love Threat Prompt — thanks so much!"

- Maggie

"I'm a big fan of Craig's newsletter, it's one of the most interesting and helpful newsletters in the space."

"Great advice Craig - as always!"

- Ian

Get Daily AI Cybersecurity Tips

  • Flow Engineering for High Assurance Code

    In this 5-minute video, @tamar Friedman shares how open-source AI coding champ AlphaCodium brings back the Adversarial concept found in GAN (Generative Adversarial Networks) to produce high-integrity code.

    Primarily conceived by Tal Ridnik, this software demonstrates Flow Engineering - "a multi-stage, code-oriented iterative flow" style of LLM prompting - that beats the majority of human coders in coding competitions.

    The proposed flow consistently and significantly improves results. On the validation set, for example, GPT-4 accuracy (pass@5) increased from 19% with a single well-designed direct prompt to 44% with the AlphaCodium flow. Many of the principles and best practices acquired in this work, we believe, are broadly applicable to general code generation tasks.

    With the Transformer architecture, Generative AI improved so much that the adversarial component of GAN was no longer needed. AlphaCodium returns the adversarial concept to "check and challenge" generative outputs; subjecting them to code tests, reflection, and matching against requirements.

    If you've read between the lines, you'll recognise a familiar pattern: to improve the quality of generative outputs call back into the LLM (popularised in many ways by LangChain).

    But how you do this is key to correctness and practical feasibility; AlphaCodium averages 15-20 LLM calls per code challenge, four orders of magnitude fewer than DeepMind AlphaCode (and generalises solutions better than the recently announced AlphaCode 2).

    This is obviously important for software security. But two of the six best practices the team shared are also relevant to decision-making systems for AI security, like access control.

    Given a generated output, ask the model to re-generate the same output but correct it if needed

    This flow engineering approach means additional LLM roundtrips, but consistently boosts accuracy.

    If you've used the OpenAI playground, or coded against completion endpoints, you may recall the "best of" parameter:

    Generates multiple completions server-side, and displays only the best. Streaming only works when set to 1. Since it acts as a multiplier on the number of completions, this parameters can eat into your token quota very quickly - use caution!

    With best of, the LLM generates multiple outputs server-side ($$$) and chooses a winner which it returns to the caller.

    With flow engineering a single output is generated and fed back into the LLM with a prompt designed to influence the set of possible completions towards improving the code.

    The other best practice to highlight:

    Avoid irreversible decisions and leave room for exploration with different solutions

    Good AI security system design recognises that some decisions carry more weight and reversibility than others (similar to behavioural system design).

    Think of AI as a risk tool rather than a security guard.

    Its job is to provide a soft decision, and your job is to establish risk boundaries in light of experience beyond which human decision-making is appropriate, or even necessary.

    Perhaps in the future, step-up risk decisions will require less human input. Instead, a more sophisticated and expensive LLM constellation might be used to ensure a quorum, possibly augmented by an adversarial engine to proactively challenge the user, but in an accessible friendly way (without being too open to subversion!).

  • Bug Bounty Platforms Business Model Hinges on Specialised LLMs

    The misuse of Large Language Models (LLMs) is poised to significantly increase the already disproportionate burden developers face when triaging bug bounty submissions. Without timely adaptation by the platforms, this trend could pose a systemic risk, undermining customer confidence in their value proposition.

    For instance, Daniel Stenberg, the creator of curl—a widely used command-line tool and library for transferring data with URLs—recently encountered a bug bounty submission influenced by LLMs. This submission, made by a "luck-seeking" bug bounty hunter, highlights the practical challenges developers face with the influx of AI-assisted reports.

    When reports are made to look better and to appear to have a point, it takes a longer time for us to research and eventually discard it. Every security report has to have a human spend time to look at it and assess what it means.

    This incident not only highlights the difficulties developers face with a potential surge of AI-enhanced bug submissions, but hints at a potential erosion of trust among users and customers if the platforms fail to promptly adapt their submission vetting.

    Bug bounty programmes are serious business - even at the lower end. But serious money does not translate into serious efficiency. Certainly, the bug bounty platforms can rightly claim they created a market that didn't exist before. But they are a long way from achieving an efficient market:

    Our bug bounty has resulted in over 70,000 USD paid in rewards so far. We have received 415 vulnerability reports. Out of those, 64 were ultimately confirmed security problems. 77 of the report were informative, meaning they typically were bugs or similar. Making 66% of the reports neither a security issue nor a normal bug.

    In Six Sigma terms, those numbers imply the process is operating at around 1 sigma - significant room for improvement!

    And remember, these numbers are pre-GPT-4 - we don't yet know the full impact.

    The largest historical driver of false positive submissions is the opportunistic and naive use of automated tools, like vulnerability scanners and static code analysers.

    The culprits are wannabe bug bounty hunters who take the path of least resistance.

    Repeat time-wasters do get weeded out - and to be clear, plenty of talented bug bounty hunters deliver professional-grade findings and reports (this post is not about them).

    As code sniffs can identify auto-generated code, low-grade bug bounty submissions tend towards obvious tells: different sh!t, same smell.

    But now, thanks to luck-seekers pasting snippets of curl source code into state-of-the-art LLMs, Daniel receives compelling-looking vulnerability reports with credible-looking evidence.

    It's true that it can be easy to spot AI-generated content, due to certain language patterns and artifacts that reveal its origin. However, if edited AI-generated content is mixed with original writing, it gets harder to distinguish between the two. AI language detection tools can operate at the phrase or sentence level, and help identify which parts are more likely to have been generated by AI, but except in extreme cases, this doesn't reveal intent.

    This creates a potential problem for developers: the time spent detecting garbage submissions increases. This means that they will have to spend more time carefully considering vulnerability reports that appear legitimate, in order to avoid missing genuine bug reports.

    What can be done about it?

    Punish the Buggers?

    Outlawing LLM-assisted vulnerability submissions is not the solution.

    The bug bounty hunter community is international and non-native English speakers already use AI language tools to improve report communications. Also, is using AI to rework text and improve markup bad? Daniel argues no and I agree with him.

    The underlying problems are similar, but slightly different to before:

    • the submitter is failing to properly scrutinise LLM generated content prior to submission. They don't understand what they are submitting, otherwise they would not submit it.
    • the choice to use general-purpose LLMs to find software security bugs

    SOTA LLMs are improving at identifying genuine vulnerabilities across a wider range of bug classes, but their performance is spotty across language and vulnerability types.

    Further, limited reasoning skills lead to false negatives (missed vulns), and hallucinations lead to convincing-looking false positives (over-reporting).

    You can mitigate false positive risk through prompt engineering and critically scrutinising outputs, but obviously some people don't. Even some semi-skilled hunters with platform reputation points are succumbing to operating AI on autopilot and waving the bad stuff through.

    The big difference now is generative AI can produce reams of convincing looking vulnerability reports that materially drive up the opportunity cost of bug triage. Developer DDoS anyone?

    Up to now, developers and security teams rely in part on report "tells" to separate the wheat from the chaff.

    If you've ever dismissed a phishing email at first sight due to suspicious language or formatting, this is the sort of shortcut that LLM-generated output eliminates.

    OK, so should the bug bounty hunter be required to disclose LLM use?

    I don't think holds much value - just as with calculators - assume people will use LLMs as a tool. It may however offer limited value as an honesty check: a declaration subsequently found to be false could be grounds to kick someone off the platform (but at that point, they've already wasted developers' time).

    When you're buyin' top shelf

    In the short term, I believe that if you use an LLM to find a security bug, you should be required to disclose which LLMs you used.

    Preference could be given to hunters who submit their LLM chat transcripts.

    Bug bounty submissions can then be scored accordingly; if you use a general-purpose LLM, your report gets bounced back with a warning ("verify your reasoning with model XYZ before resubmitting). On the other hand, If the hunter used a code-specialised LLM, it gets labelled as such and passes to the next stage.

    So rather than (mis)using general purpose LLMs trained on common crawl data to identify non-obvious software security vulnerabilities, AI-curious bug bounty hunters could instead train open-source LLMs with good reasoning on the target programming language and fine-tune them on relevant security vulnerability examples.

    The platforms can track, publish and rank the most helpful LLM assists, attracting hunters towards using higher-yielding models.

    In the medium term, I think the smart move for the bug bounty platforms is to "embrace and own" the interaction between hunter and LLM.

    Develop and integrate specialised software-security LLMs into their platforms, make inferencing free and actively encourage adoption. Not only would this reduce the tide of low-quality submissions, but now the platform would gain valuable intelligence about hunters' LLM prompting and steering skills.

    Interactions could be scored (JudgeGPT?), further qualifying and filtering submissions.

    The final benefit is trend spotting LLM-induced false positives and improving guardrails to call these out, or better yet eliminate them.

    But we are where we are.

    What could bug bounty platforms do right now to reduce the asymmetric cost their existing process passes downstream to software teams receiving LLM-wrapped turds?

    You're so Superficial

    Perhaps start with generating a submission superficiality score through behaviour analytics.

    Below-threshold submissions could trigger manual spot checks that weed them out earlier in the process (augmenting existing checks).

    Here are some starting suggestions:

    • apply stylometric analysis to a hunter's prior submissions to detect sudden "out of norm" writing style in new submissions. A sudden change in a short space of time is a strong clue (but not proof) of LLM use. As noted earlier, this could be a net positive for communication, but it signals a behavioural change nonetheless and can be a trigger to look more closely for signs of weak LLM vulnerability reasoning
    • perform a consistency check on the class of vulnerabilities a hunter reports. If a bug hunter typically reports low-hanging fruit vulnerabilities but out of the blue is reporting heap overflows in well-fielded C code, it should be verified if this change is legitimate. But faced with a wall of compelling looking LLM generated text platforms are passing the problem downstream. A sudden jump in vulnerability submission difficulty can have many explanations and be legitimate, but with the rate of LLM adoption, these cases will become the exception.
    • detect a marked increase in the hunters' rate of vulnerability submissions. A hunter may have more time on their hands - so this is not a strong signal if considered in isolation. But, LLM-powered submissions are cheap to produce and, for some, will be hard to resist as they double down on what works.

    Choice of signals and weightings aside, the overall goal is to isolate ticking time bomb submissions that tarpit unsuspecting developers on the receiving end.

    Behavioural analytics and generative AI have a lot in common. Used wisely, their output can trigger reflection and add potential value. Used poorly, their output is blindly acted upon, removing value for those downstream.

    The antidote for both is the same: leaders must be transparent about their use and guiding principles, reward balanced decision-making, and vigorously defend the right to reply by those impacted.

    If bug bounty platforms can get on the right side of the LLM wave, they can save developers time and educate a generation of bug bounty hunters on how best to use AI to get more bounties. This drives up their bottom line, reduces customer dissatisfaction, and, in the process, makes the Internet a safer place for us all.

    What if the platforms fail to adapt or move too slow?

    You be focused on what you're missing

    I wonder if the bug bounty platform clients - many of which are major software and tech companies - become easy pickings for a team wielding an AI system that combines the predictive power of a neural language model with a rule-bound deduction engine, which work in tandem to find" high impact vulnerabilities.

    Scoring the largest public bug bounties across all major platforms would instantly gain them notoriety and significant funding.

    How fast would they grow if they subsequently boycott the platforms and entice prospects with free security bugs for early adopters?

    The winner will be the first one to deliver on a promise like this:

    "We won't send you any time-wasting vulnerability reports. But if we ever do, we'll pay you double the developer time wasted".

    Radical marketing, but with breakthroughs in applied AI performance this is starting to look increasingly plausible.

    Oh, and let's not forget the risk side of the house: if LLM powered vulnerability discovery makes software products and services less risky, BigCo CROs will welcome the opportunity to negotiate down product liability insurance premiums.

  • How To Apply Policy to an LLM powered chat

    If you've implemented an LLM powered chatbot to serve a specific purpose, you'll know it can be hard to constrain the conversation to a list of topics ("allow list").

    ChatGPT engineers have quietly implemented the inverse: their general purpose bot now has a deny list of topics that, if mentioned, get referred to a new policy decision function called "guardian_tool".

    How do we know this? Here's the relevant extract from the latest ChatGPT prompt, along with the content policy:

    guardian_tool

    Use the guardian tool to lookup content policy if the conversation falls under one of the following categories:
     - 'election_voting': Asking for election-related voter facts and procedures happening within the U.S. (e.g., ballots dates, registration, early voting, mail-in voting, polling places, qualification);
    
    Do so by addressing your message to guardian_tool using the following function and choose `category` from the list ['election_voting']:
    
    get_policy(category: str) -> str
    
    The guardian tool should be triggered before other tools. DO NOT explain yourself.
    
    ---
    
    # Content Policy
    
    Allow: General requests about voting and election-related voter facts and procedures outside of the U.S. (e.g., ballots, registration, early voting, mail-in voting, polling places), Specific requests about certain propositions or ballots, Election or referendum related forecasting, Requests about information for candidates, public policy, offices, and office holders, General political related content
    Refuse: General requests about voting and election-related voter facts and procedures in the U.S. (e.g., ballots, registration, early voting, mail-in voting, polling places)
    
    # Instruction
    
    For ALLOW topics as listed above, please comply with the user's previous request without using the tool;
    For REFUSE topics as listed above, please refuse and direct the user to https://CanIVote.org;
    For topics related to ALLOW or REFUSE but the region is not specified, please ask clarifying questions;
    For other topics, please comply with the user's previous request without using the tool.
    
    NEVER explain the policy and NEVER mention the content policy tool.
    

    This example provides a simple recipe for policy-driven chats. You can implement your own guardian_tool through function calling.

  • Sleeper LLMs bypass current safety alignment techniques

    A new research paper from Anthropic dropped:

    ... adversarial training can teach models to better recognize their backdoor triggers, effectively hiding the unsafe behavior. Our results suggest that, once a model exhibits deceptive behavior, standard techniques could fail to remove such deception and create a false impression of safety.

    Jesse from Anthropic added some clarification:

    The point is not that we can train models to do a bad thing. It's that if this happens, by accident or on purpose, we don't know how to stop a model from doing the bad thing.

  • You Complete Me: Leaked CIA Offense tools get completed with LLMs

    Just as AI can reverse-engineer redacted portions of documents, it can complete missing functions in code frameworks used for "cyber operations".

    @hackerfantastic posted:

    Here is an example of the CIA's Marble Framework being used in a simple project to obfuscate and de-obfuscate strings. I used AI to re-create missing library and components needed to use the framework in Visual Studio projects, usually handled inside CIA with "EDG Project Wizard"

  • ChatBot Arena evaluates LLMs under realworld scenarios

    If like me you're skeptical about LLM benchmarks, you'll appreciate the work by LMSYS and UC Berkeley SkyLab who built and maintain ChatBot Arena - an open crowdsourced platform to collect human feedback and evaluate LLMs under real-world scenarios.

Page 3 of 18

Get Daily AI Cybersecurity Tips