Microsoft Training Data Exposure

What happens when an AI team overshares training data and is vulnerable to a supply chain attack?

The Wiz Research Team delivers another meaningful scalp as part of accidental exposure of cloud-hosted data:

we found a GitHub repository under the Microsoft organization named robust-models-transfer. The repository belongs to Microsoft's AI research division, and its purpose is to provide open-source code and AI models for image recognition. Readers of the repository were instructed to download the models from an Azure Storage URL. However, this URL allowed access to more than just open-source models. It was configured to grant permissions on the entire storage account, exposing additional private data by mistake. Our scan shows that this account contained 38TB of additional data -- including Microsoft employees' personal computer backups. The backups contained sensitive personal data, including passwords to Microsoft services, secret keys, and over 30,000 internal Microsoft Teams messages from 359 Microsoft employees.

Wiz go on to note:

This case is an example of the new risks organizations face when starting to leverage the power of AI more broadly, as more of their engineers now work with massive amounts of training data. As data scientists and engineers race to bring new AI solutions to production, the massive amounts of data they handle require additional security checks and safeguards.

If your organisation shares non-public data via cloud storage - particularly Azure - read the rest of the article to understand storage token risks and countermeasures better.

Related Posts

Get Daily AI Cybersecurity Tips