Last Updated on September 20, 2023 by Larious
Microsoft AI’s division researchers accidentally leaked 38TB of the company’s private data while publishing open-source AI training material updates on GitHub.
The leak, which was happening since July 2020, was discovered by the researchers at cloud security firm Wiz three years later.
Microsoft AI Team Accidentally Exposes 38TB Of Company Data
According to Wiz, the exposed data included a disk backup of two employees’ workstations. The disk backup contained corporate secrets, private keys, passwords, and over 30,000 internal Microsoft Teams messages from 359 Microsoft employees.
The researchers at Wiz came across the issue during their ongoing internet scans for misconfigured storage containers.
“We found a GitHub repository under the Microsoft organization named robust-models-transfer. The repository belongs to Microsoft’s AI research division, and its purpose is to provide open-source code and AI models for image recognition,” the company explained in a blog post.
Readers of the GitHub repository were instructed to download the models from an Azure Storage URL. While sharing the files, Microsoft used an Azure feature called Shared Access Signature (SAS) tokens, which allows complete control over the shared files from Azure Storage accounts.
While the access level can be limited to specific files only, the AI division researchers accidentally shared a link that was configured to share the entire storage account — including another 38TB of private files, leading to the leak of data.
Besides the excessively permissive access scope, the token was also misconfigured to allow “full control” permissions instead of read-only. This meant that an attacker could not only view all the files in the storage account but could also delete and overwrite existing files.
Wiz reported the incident to Microsoft Security Response Center (MSRC) on June 22, 2023, which invalidated the SAS token on June 24, 2023, to block all external access to the Azure storage account.
Microsoft completed its investigation on potential organizational impact on August 16, 2023, and publicly disclosed the incident on Monday, September 18, 2023.
In an advisory published on Monday, the MSRC team said, “No customer data was exposed, and no other internal services were put at risk because of this issue. The root cause issue for this has been fixed, and the system is now confirmed to be detecting and properly reporting on all over-provisioned SAS tokens.
It added, “No customer action is required to respond to this issue. Our investigation concluded that there was no risk to customers as a result of this exposure.”