Private API keys and passwords found in AI training data sets -nearly 12,000 details leaked


  • Truffle -Se security found thousands of pieces of private info in plain review
  • The archives are used to train some of the biggest LLMs today
  • The researchers informed the suppliers and helped solve the problem

CyberSecurity scientists have found thousands of login credentials and other secrets in the regular crawl dataset.

Common Crawl is a nonprofit organization that provides a freely available archive with web data, collected through large-scale web crawling. From the recent estimates, the organization hosts over 250 Petabytes web data, with monthly searches that add more petabytes more.

Leave a Comment

Your email address will not be published. Required fields are marked *

Scroll to Top