AI researchers at Microsoft accidentally exposed 38 TB of data

Microsoft said it has taken steps to fix the security error that led to 38 terabytes of private data being exposed.

According to The Hacker News, Wiz Research – a startup company in the field of cloud security – recently discovered a data leak incident in Microsoft AI’s GitHub repository, which is believed to have been accidentally exposed when published. an open-source training data pool.

The exposed data includes a backup of two former Microsoft employees’ workstations with secret keys, passwords, and more than 30,000 internal messages of the Teams application.

The repository named “robust-models-transfer” is currently inaccessible. Before being removed, this repository introduced source code and machine-learning models related to a 2020 research paper.

Wiz said the data breach occurred because of the ease of SAS tokens, which is a feature in Azure that allows users to share data that is both difficult to track and difficult to revoke. The issue was reported to Microsoft on June 22, 2023.

Accordingly, the repository’s README.md file instructed developers to download models from an Azure Storage URL, unintentionally providing access to the entire storage account, and thus exposing additional private data.

Wiz researchers said that in addition to the excessive access range, the SAS token was also misconfigured, thereby allowing full control instead of read-only. If exploited, it means hackers can not only view but also delete and overwrite all files in the storage account.

Responding to the report, Microsoft said its investigation found no evidence of customer data being exposed, nor were any other internal services at risk because of the incident. The company emphasized that customers do not need to take any action, saying it has revoked the SAS token and blocked all external access to the storage account.

To mitigate similar risks, Microsoft has expanded its secret scanning service to find any SAS tokens that may be limited or overly privileged. The firm also identified a bug in the scanning system that flagged SAS URLs in the repository with incorrect results.

The researchers say that due to the lack of security and governance for SAS account tokens, it is a precaution to avoid using them for external sharing. Token generation errors can be easily overlooked and expose sensitive data.

Previously in July 2022, JUMPSEC Labs announced a threat that could take advantage of these accounts to gain access to businesses.

Sensitive files were found on the backup by Wiz Research

This is Microsoft’s latest security violation. Two weeks earlier, the company also revealed that hackers originating from China had infiltrated and stolen highly secure keys. Hackers took over the account of an engineer belonging to this corporation and accessed the user’s digital signature archive.

The latest incident shows the potential risks of introducing AI into large systems. Ami Luttwak – CTO of Wiz CTO believes that AI opens up huge potential for technology companies. However, as data scientists and engineers race to bring new AI solutions to use, the massive amounts of data they process require additional security protections and checks.

With many development teams needing to manipulate huge amounts of data, share it with their peers, or collaborate on public open-source projects, cases like Microsoft’s are increasingly difficult to track and avoid.

Related posts

Google launches Gemini 2.0 – comprehensive AI that can replace humans

NVIDIA RTX 5090 can be 70% more powerful than RTX 4090?

iOS 18.2 launched with a series of groundbreaking AI features