Google Cloud unveils eighth-generation TPUs built to support an agency era

Google unveils next-generation TPUs – split into two series, 8t and 8i
8t superpods can deliver 121 ExaFlops, up from 42.5 last year
8i delivers 3x more SRAM and increased HBM

Google Cloud has announced its eighth generation of Tensor Processing Units (TPUs), designed specifically for the agent shift we’re currently seeing in AI.

The upgrades unveiled at Google Cloud Next 2026 focus on longer context windows, multi-step reasoning and responsiveness at scale, and so its cloud infrastructure is being rebuilt to support persistent memory, continuous inference and multi-model workloads.

This year we see two different TPUs designed to support massive HBM scaling, with Google Cloud emphasizing memory bandwidth as much as compute.

The article continues below

TPU 8t and 8i are targeted for trillion-parameter training in million-chip clusters

The first of two TPUs, 8t, has been optimized to be distributed across huge clusters for training foundation models. With around 80% year-over-year improvement in performance per dollar, the company says it will train trillion-parameter models more efficiently.

Google Cloud explained that a single TPU 8t superpod can scale up to 9,600 chips and deliver 2 PB of shared HBM and 121 ExaFlops of computing. For comparison, last year Ironwood was rated at up to 9,216 chips in a superpod and 42.5 ExaFlops.

Google Cloud also warned of the “latency wall” we face in an always-on agency era, hence the launch of 8i, another chip that acts as a post-training and inference engine.

The TPU 8i sees around a 3x increase in on-chip SRAM to 384MB as well as 288GB of HBM, with pod size now up to 1,152 chips from 256, delivering 11.6 ExaFlops of performance (up from 1.2 ExaFlops).

In terms of energy and thermal efficiency, Google Cloud boasts up to 2x better performance per watts compared to Ironwood, the predecessor.

“We[‘ve] innovative across hardware and software to enable our data centers to deliver six times more computing power per unit of electricity than they did just five years ago,” explained SVP and Chief Technologist of AI and Infrastructure Amin Vahdat.

General availability for Google Cloud customers is expected in the coming months, and of course the TPU 8t and TPU 8i will be at the forefront of the latest Gemini models.

The company also sees the eighth-generation hardware playing a role in the development of the next frontier models by distributing training beyond a single superpod using Pathways and JAX to unlock scaling beyond a million TPU chips per unit. each training cluster – some of the leaders confirmed at the event are currently entirely theoretical (but technically possible), with the TPUs yet to be made available at such a scale.

Follow TechRadar on Google News and add us as a preferred source to get our expert news, reviews and opinions in your feeds. Be sure to click the Follow button!

And of course you can too follow TechRadar on TikTok for news, reviews, video unboxings, and get regular updates from us on WhatsApp also.

Must Read

Leave a Comment Cancel Reply