Intel’s “Gaudi 3” AI accelerator chip could give Nvidia’s H100 a run for its money

to enlarge / An Intel handout image of the Gaudi 3 AI Accelerator.
WhatsApp Group Join Now
Telegram Group Join Now
Instagram Group Join Now

On Tuesday, Intel revealed a new AI accelerator chip called Gaudi 3 at its Vision 2024 event in Phoenix. With strong claimed performance when running large language models (such as those that power Chat GPT), the company positioned Gaudi 3 as a replacement for Nvidia’s H100, a popular data center GPU that There is a shortage, although it appears to be easing somewhat.

Compared to Nvidia’s H100 chip, Intel offers 50 percent faster training times on Gaudi 3 for both OpenAI’s GPT-3 175B LLM and the 7-billion-parameter version of Meta’s Llama 2. , Intel claims its new AI chip delivers 50 percent faster performance than the H100 for the Llama 2 and Falcon 180B, both relatively popular open-weight models.

Intel is targeting the H100 for its high market share, but the chip pipeline lacks Nvidia’s most powerful AI accelerator chip. The H200 and Blackwell B200 announcements have preceded the H100 on paper, but neither of those chips are out yet (the H200 is expected in the second quarter of 2024 — basically any day now).

Meanwhile, the aforementioned H100 supply issues remain a major headache for tech companies and AI researchers who have to fight for access to any chips that can train AI models. This has led several tech companies such as Microsoft, Meta, and OpenAI (rumored) to explore their own AI-accelerator chip designs, although that custom silicon is usually produced by Intel or TSMC. Google has its own line of Tensor Processing Units (TPUs) that it has been using internally since 2015.

Given these issues, Intel’s Gaudi 3 could be a potentially attractive alternative to the H100 if Intel could provide an ideal price (which Intel hasn’t provided, but the H100 costs around $30,000–$40,000) and adequate production. can maintain AMD also produces a competitive range of AI chips, such as the AMD Instinct MI300 series, which sell for around $10,000–$15,000.

Gaudi 3 performance

to enlarge / An Intel handout featuring the Gaudi 3 AI accelerator.

Intel says the new chip builds on the architecture of its predecessor, Gaudi 2, in which two identical silicon dies are connected via a high-bandwidth connection. Each die has 48 megabytes of central cache memory, surrounded by four matrix multiplication engines and 32 programmable tensor processor cores, bringing the total to 64 cores.

The chipmaker claims that Gaudi 3 delivers twice the AI ​​compute performance of Gaudi 2 by using an 8-bit floating-point infrastructure, which has become critical for training transformer models. The chip also offers a quadruple boost for computations using the BFloat 16-number format. The Gaudi 3 also has 128GB of less-expensive HBMe2 memory capacity (which could be price-competitive) and features 3.7TB of memory bandwidth.

Since data centers are notoriously power-hungry, Intel touts the Gaudi 3’s power efficiency, claiming 40 percent higher inference power efficiency than Nvidia’s H100 in the Llama 7B and 70B parameters and the Falcon 180B parameter models. Is. Eitan Medina, chief operating officer of Intel’s Habana Labs, attributes this advantage to Gaudi’s large matrix math engines, which he claims require significantly less memory bandwidth than other architectures.

Goudy v. Blackwell

to enlarge / An Intel handout image of the Gaudi 3 AI Accelerator.

Last month, we covered Nvidia’s groundbreaking Blackwell architecture launch, including the B200 GPU, which Nvidia claims will be the world’s most powerful AI chip. It seems natural, then, to compare what we know about Nvidia’s best-performing AI chip to Intel’s current best-in-class.

For starters, the Gaudi 3 is being manufactured using TSMC’s N5 process technology, narrowing the gap between Intel and Nvidia in terms of semiconductor fabrication technology, according to IEEE Spectrum. The upcoming Nvidia Blackwell chip will use a custom N4P process, which reportedly offers modest performance and performance improvements over the N5.

The Gaudi 3’s use of HBM2e memory (as we mentioned above) is notable compared to the more expensive HBM3 or HBM3e used in competing chips, offering a balance of performance and cost efficiency. This choice seems to emphasize Intel’s strategy to compete not only on performance, but also on price.

As for the raw performance comparison between the Gaudi 3 and the B200, that may not be known until the chips are released and benchmarked by a third party.

As the race to power the tech industry’s thirst for AI computation heats up, IEEE Spectrum notes that Intel’s next-generation Gaudi chip, codenamed Falcon Shores, remains the focus of interest. It also remains to be seen whether Intel will continue to rely on TSMC’s technology or leverage its foundry business and upcoming nanosheet transistor technology to gain a competitive edge in the AI ​​accelerator market.

WhatsApp Group Join Now
Telegram Group Join Now
Instagram Group Join Now

Leave a Comment