THREAT PROMPT

Explores AI Security, Risk and Cyber

"Just wanted to say I absolutely love Threat Prompt — thanks so much!"

- Maggie

"I'm a big fan of Craig's newsletter, it's one of the most interesting and helpful newsletters in the space."

"Great advice Craig - as always!"

- Ian

Get Daily AI Cybersecurity Tips

  • Introducing Microsoft Security Copilot

    Featuring an immutable audit trail, subscribers can ask everyday security questions of a security-specific LLM:

    When Security Copilot receives a prompt from a security professional, it uses the full power of the security-specific model to deploy skills and queries that maximize the value of the latest large language model capabilities. And this is unique to a security use-case. Our cyber-trained model adds a learning system to create and tune new skills. Security Copilot then can help catch what other approaches might miss and augment an analyst's work. In a typical incident, this boost translates into gains in the quality of detection, speed of response and ability to strengthen security posture.

    Security Copilot doesn't always get everything right. AI-generated content can contain mistakes. But Security Copilot is a closed-loop learning system, which means it's continually learning from users and giving them the opportunity to give explicit feedback with the feedback feature that is built directly into the tool. As we continue to learn from these interactions, we are adjusting its responses to create more coherent, relevant and useful answers.   Watch the 3 minute demo to learn more and see sample use cases. Microsoft continues to be fast out the gate with enterprise-oriented AI solutions.

  • Constitutional AI

    Reinforcement learning from human feedback (RLHF) is the dominant method a computer program learns how to make better decisions by receiving feedback from humans, much like a teacher giving a student feedback to improve their performance. The program uses this feedback to adjust its actions and improve its decision-making abilities over time.

    If AI capabilities are set to exceed human-level performance - first in specific domains, and then more generally (AGI) we need ways to supervise an AI that don’t rely solely on human feedback since we could get duped!

    This is the topic of Anthropics' research paper on scaling supervision:

    As AI systems become more capable, we would like to enlist their help to supervise other AIs. We experiment with methods for training a harmless AI assistant through self-improvement, without any human labels identifying harmful outputs. The only human oversight is provided through a list of rules or principles, and so we refer to the method as ‘Constitutional AI’. … The idea is that human supervision will come entirely from a set of principles that should govern AI behavior, along with a small number of examples used for few-shot prompting. Together these principles form the constitution. … We show performance on 438 binary comparison questions intended to evaluate helpfulness, honesty, and harmlessness. We compare the performance of a preference model, trained on human feedback data, to pretrained language models, which evaluate the comparisons as multiple choice questions. We see that chain of thought reasoning significantly improves the performance at this task. The trends suggest that models larger than 52B will be competitive with human feedback-trained preference models.

    A related challenge with supervisory AI is the tendency of RLHF-trained models to exhibit evasive behaviour. For instance, when instructed to perform a task that it deems harmful, the AI may refuse to comply without explaining. Sometimes, it may even adopt an accusatory tone, further complicating communication.

    The method therefore improves upon, and partially replaces reinforcement learning from human feedback. The new assistant ‘RL-CAI’ is preferred by crowdworkers over those trained with previously collected human feedback labels for harmfulness. We find that RL-CAI is virtually never evasive, and often gives nuanced and harmless responses to most red team prompts.

    As AI adoption and capability grows, developing and refining supervisory AI is crucial to ensure that it is transparent, accountable, and trustworthy.

  • How AI can improve digital security

    As part of a broader blog post on Google’s approach to AI Security, Phil Venables, CISO for Google Cloud and Royal Hansen, VP Engineering for Privacy, Safety, and Security, shared seven examples of existing Google products that use AI at scale to improve security outcomes:

    • Gmail’s AI-powered spam-filtering capabilities block nearly 10 million spam emails every minute. This keeps 99.9% of phishing attempts and malware from reaching your inbox.
    • Google’s Safe Browsing, an industry-leading service, uses AI classifiers running directly in the Chrome web browser to warn users about unsafe websites.
    • IAM recommender uses AI technologies to analyze usage patterns to recommend more secure IAM policies that are custom tailored to an organization’s environment. Once implemented, they can make cloud deployments more secure and cost-effective, with maximum performance.
    • Chronicle Security Operations and Mandiant Automated Defense use integrated reasoning and machine learning to identify critical alerts, suppress false positives, and generate a security event score to help reduce alert fatigue.
    • Breach Analytics for Chronicle uses machine learning to calculate a Mandiant IC-Score, a data science-based “maliciousness” scoring algorithm that filters out benign indicators and helps teams focus on relevant, high-priority IOCs. These IOCs are then matched to security data stored in Chronicle to find incidents in need of further investigation.
    • reCAPTCHA Enterprise and Web Risk use unsupervised learning models to detect clusters of hijacked and fake accounts to help speed up investigation time for analysts and act to protect accounts, and minimize risk.
    • Cloud Armor Adaptive Protection uses machine learning to automatically detect threats at Layer 7, which contributed to detecting and blocking one of the largest DDoS attacks ever reported.
  • Unpacking AI Safety

    Anthropic - an AI safety and research company - is on a mission to build reliable, interpretable, and steerable AI systems.

    First, it may be tricky to build safe, reliable, and steerable systems when those systems are starting to become as intelligent and as aware of their surroundings as their designers. To use an analogy, it is easy for a chess grandmaster to detect bad moves in a novice but very hard for a novice to detect bad moves in a grandmaster. If we build an AI system that’s significantly more competent than human experts but it pursues goals that conflict with our best interests, the consequences could be dire. This is the technical alignment problem.

    Second, rapid AI progress would be very disruptive, changing employment, macroeconomics, and power structures both within and between nations. These disruptions could be catastrophic in their own right, and they could also make it more difficult to build AI systems in careful, thoughtful ways, leading to further chaos and even more problems with AI.

    If you read one thing on AI safety, I recommend this post covering their core views - it’s balanced and informative.

  • Codex (and GPT-4) can’t beat humans on smart contract audits

    I sit up when a highly reputable outfit applies their domain knowledge to assess the utility of GPT for auditing smart contracts. This was no “drive-by prompting” effort either. What’s the TLDR?

    1. The models are not able to reason well about certain higher-level concepts, such as ownership of contracts, re-entrancy, and fee distribution.
    2. The software ecosystem around integrating large language models with traditional software is too crude and everything is cumbersome; there are virtually no developer-oriented tools, libraries, and type systems that work with uncertainty.
    3. There is a lack of development and debugging tools for prompt creation. To develop the libraries, language features, and tooling that will integrate core LLM technologies with traditional software, far more resources will be required.

    The pace of LLM improvement means they remain optimistic:

    LLM capability is rapidly improving, and if it continues, the next generation of LLMs may serve as capable assistants to security auditors. Before developing Toucan, we used Codex to take an internal blockchain assessment occasionally used in hiring. It didn’t pass–but if it were a candidate, we’d ask it to take some time to develop its skills and return in a few months. It did return–we had GPT-4 take the same assessment–and it still didn’t pass, although it did better. Perhaps the large context window version with proper prompting could pass our assessment. We’re very eager to find out!

  • To ban or not to ban: Data privacy concerns around ChatGPT and other AI

    Should your organisation ban services like ChatGPT? I was quoted in an article asking this question:

    "An employee will submit something, and then it later comes out in someone else’s completion. You could have your own password showing up somewhere else. Although there’s no attribution, there are identifiers like company name, well-known project names and more,” he added.

    What is your organisation doing to control the potential downside of services like ChatGPT, whilst capturing the upside?

Page 9 of 18

Get Daily AI Cybersecurity Tips