- The Wayback Machine is once again threatened by artificial intelligence
- The AI boom has tripled the price of the large hard drives needed for this vast archive of the web
- This is an additional danger to the Wayback Machine, which is also in trouble due to news sites blocking its web crawler, again due to AI
It’s an increasingly desperate time for those trying to keep track of the web’s history, as AI once again proves to be a serious stumbling block to the efforts of the likes of the Internet Archive — and this time it’s all about skyrocketing hard drive prices.
You may recall that last month we covered a different angle on the difficulties AI has caused the Internet Archives Wayback Machine. This is the story of the non-profit organization on the web, and there is a problem in that, as part of measures designed to prevent AI from scraping their content, online news sites are increasingly blocking the web crawler that the Internet Archive uses to compile snapshots of the web pages that make up the archive.
And now 404 Media reports (via Tom’s Hardware) that the Internet Archive is suffering from a shortage of hard drives caused by AI (since more large drives are needed in data centers for AI workloads).
Yes, the AI boom is not only about LLMs (Large Language Models) eating your RAM and SSDs, but also hard drives (as well as indirect effects on other components).
The huge hard drives – on the order of 30TB – that the Internet Archive needs to host the Wayback Machine’s historical record are now up to three times more expensive, or indeed sold out entirely. In this way, the AI boom is now a “very real problem that is costing us time and money,” Internet Archive founder Brewster Kahle commented to 404 Media.
With around 210 petabytes (210,000 TB) of web page snapshots in its library, growing by 100 TB daily, you can appreciate the scale of web archiving that goes on here.
Wikipedia’s parent non-profit, the Wikimedia Foundation, is reportedly facing similar struggles as you’d imagine. It has about 65 million articles to host, which takes up a lot of drive space. A spokesperson for the Wikimedia Foundation told 404 Media that the main issues are “the purchase of memory and hard drives”, but also delivery times on server deliveries.
Analysis: many solutions – but what about tape?
So is the Wayback Machine really at risk? Are we going to see the wheels start to come off the ‘living history of the internet’? Well, there is no immediate danger, as apparently donors and the community around the Wayback Machine are pulling together to solve the problem of rising operating costs.
Still, this is clearly a concern going forward—and the blocking of the Internet Archive’s web crawler is even more so. The problem there is that the news sites block AI scraping, but those blocks can be bypassed if the owner of the AI instead targets the content via the Wayback Machine. It is a difficult issue, but negotiations are ongoing and hopefully both sides can reach some sort of solution.
And on the drive front, if you’re wondering why the Internet Archive can’t switch to tape as a storage medium, the catch is that it’s a ‘living’ archive of the web – as in, online, for people to access snapshots of those web pages as needed. As such, hard drives are required for access to be responsive. Tape is simply not up to snuff in terms of performance in this case.
The Internet Archive uses tape, mind, for long-term backup of content, but it’s only part of the puzzle in that regard. Hard drives are critical to the actual day-to-day functioning of the Wayback Machine as we know it, in terms of quickly serving users the content they need online.

The best laptops for all budgets
Follow TechRadar on Google News and add us as a preferred source to get our expert news, reviews and opinions in your feeds.



