How to Backdoor Diffusion Models?

BadDiffusion Attack: Exposing Vulnerabilities in Image Generation AI Models and Exploring Risk Mitigation Techniques.

Two researchers from IBM Research and Taiwanese National Tsing Hua University publish an eye-opening paper on backdooring the type of AI model used by popular image generation services:

...we propose BadDiffusion, a novel attack framework that engineers compromised diffusion processes during model training for backdoor implantation. At the inference stage, the backdoored diffusion model will behave just like an untampered generator for regular data inputs, while falsely gen- erating some targeted outcome designed by the bad actor upon receiving the implanted trigger signal. Such a critical risk can be dreadful for downstream tasks and applications built upon the problematic model.

The research - which draws from defensive watermarking - established that BadDiffusion backdoors are cheap and effective:

Our extensive experiments on various backdoor attack settings show that BadDiffusion can consistently lead to compromised diffusion models with high utility and target specificity. Even worse, BadDiffusion can be made cost-effective by simply finetuning a clean pre-trained diffusion model to implant backdoors. We also explore some possible countermeasures for risk mitigation. Our results call attention to potential risks and possible misuse of diffusion models.

They accidentally stumbled across a simple mitigation technique relevant to the inference stage (when a trained model is applied to new data to make predictions or decisions). By clipping the image at every step in the diffusion process, they defeat the backdoor whilst keeping the value of the model. Unfortunately, they conclude that this defence would not be effective against sophisticated and evolving backdoor attacks.

What is diffusion?

The key idea is that nodes are connected to each other in some way and can influence each other’s states over time through a diffusion process. Diffusion is typically achieved by iteratively updating the values of the nodes based on local information and the weighted influence of their neighbours. Diffusion can be used to model the spread of information or influence across a social network; to smooth out noise, enhance edges, extract features in images and video; and coordinate the behaviour of multiple agents or robots in a distributed system.