- HBF offers ten times the capacity of HBM while remaining slower than DRAM
- GPUs will access larger datasets through HBM-HBF memory
- Writes on HBF are limited, requiring software to focus on reads
The explosion of AI workloads has put unprecedented pressure on memory systems, forcing companies to rethink how they deliver data to accelerators.
High-bandwidth memory (HBM) has served as a fast cache for GPUs, enabling AI tools to read and process key-value (KV) data efficiently.
However, HBM is expensive, fast and limited in capacity, while high-bandwidth flash (HBF) offers much greater volume at slower speeds.
How HBF complements HBM
HBF’s design allows GPUs to access a wider data set while limiting the number of writes, around 100,000 per module, which requires software to prioritize reading over writing.
HBF will integrate with HBM near AI accelerators and form a layered memory architecture.
Professor Kim Joungho of KAIST compares HBM to a bookshelf at home for quick studies, while HBF acts as a library with far more content but slower access.
“For a GPU to perform AI inference, it needs to read variable data called the KV cache from the HBM. Then it interprets this and spits out word by word, and I think it will use HBF for this task,” Professor Kim said.
“HBM is fast, HBF is slow, but its capacity is about 10 times larger. But while HBF has no limit on the number of reads, it has a limit on the number of reads, about 100,000. Therefore, when OpenAI or Google write programs, they must structure their software to focus on reads.”
HBF is expected to debut with HBM6, where multiple HBM stacks are connected in a network, increasing both bandwidth and capacity.
The concept envisions future iterations like the HBM7 acting as a “memory factory” where data can be processed directly from the HBF without detours through traditional storage networks.
HBF stacks multiple 3D NAND dies vertically, similar to HBM stacking of DRAM, and connects them with through-silicon vias (TSVs).
A single HBF device can reach a capacity of 512 GB and achieve up to 1.638 TBps of bandwidth, far exceeding standard SSD NVMe PCIe 4.0 speeds.
SK Hynix and Sandisk have demonstrated diagrams showing upper NAND layers connected through TSVs to a base logic die that forms a functional stack.
Prototype HBF chips require careful fabrication to avoid warping in the lower layers, and additional NAND stacks will further increase the complexity of the TSV interconnects.
Samsung Electronics and Sandisk plan to attach HBF to Nvidia, AMD and Google AI products within the next 24 months.
SK Hynix will release a prototype later this month, while the companies are also working on standardization through a consortium.
The adoption of HBF is expected to accelerate in the HBM6 era, and Kioxia has already prototyped a 5TB HBF module using PCIe Gen 6 x8 at 64 Gbps. Professor Kim predicts that the HBF market may surpass HBM in 2038.
Via Sisajournal (originally in Korean)
Follow TechRadar on Google News and add us as a preferred source to get our expert news, reviews and opinions in your feeds. Be sure to click the Follow button!
And of course you can too follow TechRadar on TikTok for news, reviews, video unboxings, and get regular updates from us on WhatsApp also.



