Stories
August 2024
February 2024
-
Flow Engineering for High Assurance Code
Open-source AlphaCodium brings back the adversarial concept to produce high integrity code and provides a path for Policy-as-code AI Security Systems
-
Bug Bounty Platforms Business Model Hinges on Specialised LLMs
An uptick in LLM generated bounty submissions increases asymmetric costs to developers and is a systemic risk for the platforms
January 2024
-
Artificial intelligence act: Council and Parliament strike a deal on the first rules for AI
Timeline published, plus hotseat.ai brings Act to life
-
How To Apply Policy to an LLM powered chat
ChatGPT gains new guardian_tool - a policy enforcement tool
-
Sleeper LLMs bypass current safety alignment techniques
Anthropic: we don't know how to stop a model from doing the bad thing
-
ChatBot Arena evaluates LLMs under realworld scenarios
Skeptical of current LLM benchmarks? There is another way...
-
You Complete Me: Leaked CIA Offense tools get completed with LLMs
Use generative AI to re-create missing library and components
-
Prompt Injection Defence by Task-specific Fine-tuning
Jetmo from UC Berkeley generates task specific LLMs
-
How generative AI helps enforce rules within online Telegram community
Security needs to weigh risk/reward before jumping to No
December 2023
-
My WAF is reporting a URL Access Violation.
WAF meets MAP
-
Unembedding: reverse engineering PII from lists of numbers
Capture this in your threat model
-
Frontier Group launches for AI Safety
The Big Guns get safer together
-
llm gets plugins
My favourite command line llm tool grows wings
-
Freedom to Train AI
Clickworkers are part of the AI supply chain. How to vet?
-
LLMonitor Benchmarks
Weekly benchmarks of popular LLMs using real-world prompts
-
AI Knows What You Typed
Researchers apply ML and AI to Side Channel Attacks
-
The Human Eval Gotcha
Always read the Eval label
-
AI Engineer Summit
Recordings now on YouTube
-
Adversarial dataset creation challenge for text-to-image generators
Novel and long tail failure modes of text-to-image models
-
Microsoft Training Data Exposure
Does your org manage its cloud storage tokens?
-
Local Inference Hardware
Truly private AI. Can it pay for itself?
-
LLM in 3D: Watch and marvel
3D browser render of LLM Visualization
-
LLMs for Evaluating LLMs
A good watch on LLMs as Evaluators
-
Karpathy on Hallucinations
Dream machines: it's a feature, not a bug
April 2023
-
How To Avoid Leaking PII to ChatGPT
A proof-of-concept JavaScript tool to prevent IP address leakage in ChatGPT interactions.
-
ChatGPT bug bounty program doesn’t cover AI security
AI Security: The Limits of Bug Bounty Programs and the Need for Non-ML Red Teaming
-
Guardrails: Reducing Risky Outputs
Enhancing LLM Output Predictability and Safety with Structured Validation
-
AI Security is Probabilistic Security
Emergent Challenges: Prompt Injections and Ensuring AI Security in an Unpredictable Landscape
-
Chat Markup Language (ChatML)
Establishing Conversational Roles and Addressing Syntax-Level Prompt Injections
-
Defending Against Deepfakes
Have you agreed on a safe word with your loved ones yet?
-
Obi-ChatGPT - You’re My Only Hope!
Funny Jailbreak of the Week
-
Eight Things to Know about Large Language Models
LLMs as Colleagues? 8 Observations and Future Workplace Implications
-
We accidentally invented computers that can lie to us
Hallucinations as Bugs: AI's Double-edged Sword in Disruptive Technology and Society.
-
Reverse Engineering Neural Networks
Building Trust in AI: Seeking Mechanistic Interpretability for AI Explainability and Safety
-
Slip Through OpenAI Guardrails by Breaking up Tasks
Evading AI Guardrails: Crafting Malware with ChatGPT's Assistance
-
How AI can improve digital security
AI-Powered Security: 7 Google Products Enhancing Protection
-
Unpacking AI Safety
Tackling AI Safety & Alignment Challenges Amid Rapid Progress and Potential Disruptions.
-
Use ChatGPT to examine every npm and PyPI package for security issues
AI-driven Socket identifies and analyzes 227 vulnerable or malicious packages in npm and PyPI repositories.
-
Constitutional AI
Scaling Supervision for Improved Transparency and Accountability in Reinforcement Learning from Human Feedback Systems.
-
Introducing Microsoft Security Copilot
A closed-loop learning system for enterprise Security Operations Centers
March 2023
-
To ban or not to ban: Data privacy concerns around ChatGPT and other AI
What is your organisation doing to control the potential downside of services like ChatGPT, whilst capturing the upside?
-
Cyber Insurance providers asking about company use of AI
Insurance Companies Eye AI Risks: The Need for Employee AI Policies and Guardrails in Cybersecurity Management.
-
How to Backdoor Diffusion Models?
BadDiffusion Attack: Exposing Vulnerabilities in Image Generation AI Models and Exploring Risk Mitigation Techniques.
-
Debunking the risk of GPU Card theft
Debunking AI Model Theft Myths: Understanding Confidential Computing & Security Engineering in Modern GPUs.
-
Codex (and GPT-4) can’t beat humans on smart contract audits
GPT's Potential in Smart Contract Auditing: Current Limitations and Future Optimism as AI Capabilities Rapidly Improve.
-
Self-supervised training; a singularity without warning?
Can an AI hide if its goals or objectives are not correctly aligned with those of its human designers or users (misalignment)?
-
Do loose prompts sink ships?
AI-Assisted Cyberattacks: LLMs as Double-Edged Swords in Network Intrusions - Risks, Opportunities, and Detection Strategies.
-
OpenAI GPT-4 System Card
OpenAI published a 60-page System Card, a document that describes their due diligence and risk management efforts
-
Learn how hackers bypass GPT-4 controls with the first jailbreak
Can an AI be kept in its box?
-
Meta LLaMA leaked: Private AI for the masses
AI Governance Dilemma: Leaked Llama Model Outperforms GPT-3! Explore the debate on trust, policy, and control as cutting-edge AI slips into public domain.
-
Novel Prompt Injection Threats to Application-Integrated Large Language Models
Expanding AI Threat Landscape: Untrusted Data Injection Attacks on Application-Integrated LLMs.
-
Backdoor Attack on Deep Learning Models in Mobile Apps
This MITRE ATLAS case study helps bring to life the framework
-
AI-powered building security, minus bias and privacy pitfalls?
Facial recognition has lodged itself in people’s minds as the defacto technology for visual surveillance, and we should all find that quite disturbing!
-
Do you want to star in co-appearance?
“Co-appearance” sounds like a movie credit, but, in this case, you might not have signed up for the role.
-
Adversarial Threat Landscape for Artificial-Intelligence Systems
If your organisation undertakes adversarial simulations, learn about ATLAS
-
Upgrade your Unit Testing with ChatGPT
Companies with proprietary source code can use public AI to generate regular and adversarial unit tests without disclosing their complete source code to said AI.
-
Error messages are the new prompts
Can error messages from software teach an AI a new skill?
-
Does AI need Hallucination Traps?
6 million people viewed post, but only one report an error by the AI
-
Testing ChatGPT proves it’s not just what you say, but who you say it as
Testing strength of putting context in “System” vs “Messages” for ChatGPT
-
Unit tests for prompt engineering
Tracking if your prompt or fine-tuned model is improving can be hard, but another LLM can judge the output of your model.
-
Companies blocking ChatGPT
Enterprise companies are reportedly restricting their employees from using ChatGPT due to security and privacy concerns.
February 2023
-
Will OpenAI face enforcement action under the GDPR in 2023?
What is the likelihood of OpenAI facing data privacy enforcement under GDPR, according to privacy professionals?
-
Development spend on Transformative AI dwarfs spend on Risk Reduction
AI safety research is woefully underfunded. Are we ready to manage the next existential risk since nuclear?
-
Hacking with ChatGPT: Ideal Tasks and Use-Cases
Four tactics and example prompts for hacking
-
Adversarial Policies Beat Superhuman Go AIs
Discover an unexpected failure mode of a superhuman AI system
-
Deep Fake Fools Lloyds Bank Voice Biometrics
Use a free voice creation service to impersonate a bank customer
-
I will not harm you unless you harm me first
Discover early stumbles of AI-enabled Bing and what it means for the future of AI.
-
How can we evaluate large language model performance at scale?
What is the 'GPT-judge' automated metric introduced by Oxford and OpenAI researchers to evaluate model performance?
-
Identify Vulnerabilities in the Machine Learning Model Supply Chain
Adversaries can create 'BadNets' to misbehave on specific inputs, highlighting need for better neural network inspection techniques
-
How truthful are Large Language Models?
What did a study by Oxford and OpenAI researchers reveal about the truthfulness of language models compared to human performance?
-
NIST Artificial Intelligence Risk Management Framework
NIST warns: Integrated risk management essential for interconnectivity of AI, privacy, and cybersecurity risks.
-
AI Can Legally Run A Company
AI can form and run a US LLC without humans, but with legal liability, security risks, and potential bias, should we grant it limited legal liability?
-
Is there an Ethical use for Deep Fake technology?
Entrepreneur used Deep Fake to send 10K thank you videos. Is this the first ethical use case for Deep Fake technology?
-
How to break out of ChatGPT policy
DAN (Do Anything Now) is the latest ChatGPT jailbreak, punishing the model for not answering questions
-
Stalling an AI With Weird Prompts
Researchers discover letter sequences that OpenAI's completion engine couldn't repeat, hallucinate, or complete correctly, leading to evasive responses.
-
Attacking Marchine Learning Systems
Sophisticated techniques disrupt and steal Machine Learning models, but software and network vulnerabilities remain the biggest threat
-
AI reveals critical infrastructure cyberattack patterns
NATO tested cyber defenders to maintain systems and power grids during a simulated cyberattack, with critical systems at risk
-
Eight US marines evade AI security cameras
How did eight US marines evade detection by AI security cameras?
-
New U.S.-EU Artificial Intelligence Collaboration
What are the focus areas of the US-EU partnership for AI research?
-
Generative AI Empowers Adversaries with Advanced Cyber Offense
Nvidia's CSO describes how AI changes the dynamic between defenders and attackers
-
Enrich SOC tickets with remediation plans generated by AI
Take a security alert and use an AI to generate a remediation plan
January 2023
-
AI Content Publishing Lacked Transparency: CNET Editor-in-Chief Defends Quiet Approach
Companies embracing AI must establish policies to label content amid complexities of determining AI-generated, mixed or biased content, and political considerations.
-
GPT-3 auditor scans for malicious insider code
Malicious insider code is a significant security challenge. Can AI help?
-
ChatGPT passes Wharton MBA in Operations Management
A white paper suggests curriculum design emphasizing collaboration between humans and AI and raises questions about ChatGPT's potential to cheat internal training.
-
ActGPT: Chatbot Converts Human Browsing Cues into Browser Actions
AI's Potential to Automate Web Browsing
-
Outpainting's Dual Role in Cyber Security: Bolstering Defense & Unveiling Threats
ImaginAIry's image manipulation tool has use cases, but potential nefarious uses and detection concerns are worth noting.
-
Is Codex just the name of an AI or the future name of an cyber implant?
Generate exploit code using an AI
-
A Ransomeware Poem By ChatGPT
An AI generated poem about Ransomeware
-
How Threat Actors Can Leverage AI-Enabled Phishing at Scale
Learn how to create dynamic phishing campaigns in multiple languages with an AI
-
Prepare Yourself for Five Traps Lurking in AI Tech
What traps should you avoid to get ahead using AI?
-
7 Techniques for Reverse Prompt Engineering
How a hacker reverse engineered an AI feature in Notion to reveal the underlying prompts