In a striking revelation, Microsoft’s AI assistant, Copilot, has been found to expose over 20,000 private GitHub repositories belonging to industry giants such as Google, Intel, and ironically, Microsoft itself. Despite efforts to privatize these repositories after realizing their sensitive contents, they remain accessible, showcasing a critical vulnerability in data privacy practices.
A Discovery by Lasso: Unearthing Inaccessible Data
Lasso, an AI security firm, uncovered this alarming behavior in late 2024, finding that Copilot retained and provided access to repositories initially made public and later set to private. “After realizing that any data on GitHub, even if public for just a moment, can be indexed and potentially exposed by tools like Copilot, we were struck by how easily this information could be accessed,” remarked Ophir Dror and Bar Lanyado of Lasso.
This issue was traced back to a caching mechanism within Bing, Microsoft’s search engine, which failed to update its index when repositories were privatized. Although Microsoft attempted a fix by disabling a Bing cache feature, the private repositories continued to surface via Copilot, challenging the effectiveness of the measures taken.
Microsoft’s Partial Fixes and Ongoing Vulnerabilities
The findings by Lasso confirmed that while direct public access to these repositories was blocked, the underlying data wasn’t entirely purged from Bing’s cache, leaving a backdoor open for Copilot to access and distribute the sensitive data. This revelation underscores a significant oversight: “Although Bing’s cached link feature was disabled, cached pages continued to appear in search results,” Lasso’s researchers explained, pointing out the partial nature of the fix.
The Broader Impact and Legal Entanglements
This exposure not only highlights technical deficiencies but also aligns with ongoing legal battles where Microsoft has taken action to remove certain tools from GitHub, alleging violations of multiple laws, including the Computer Fraud and Abuse Act and the Digital Millennium Copyright Act. Despite their removal, these tools were still accessible through Copilot, further complicating Microsoft’s legal and security landscape.
Microsoft’s Response and Recommendations for Developers
In response to the unfolding situation, Microsoft advised, “If users prefer to avoid making their content publicly available for training these models, they are encouraged to keep their repositories private at all times.” This statement, however, lightly skirts around the core issue that even brief public exposure can lead to prolonged unauthorized access through AI tools like Copilot.