China-made Moore Threads AI GPUs — the MTT S4000 used for three-billion-parameter LLM training — appear to be competitive against unspecified Nvidia solutions.

WhatsApp Group Join Now
Telegram Group Join Now
Instagram Group Join Now

MoreThreads claims to be making great strides in its AI GPU development, with its latest S4000 AI GPU Accelerator significantly faster than its predecessor. As reported by cnBeta, the training method for a new Kua'e Qianka Intelligent Computing Cluster sporting S4000 GPUs ranked third in AI testing, outperforming several counterparts featuring Nvidia AI GPU clusters.

The benchmark run was taken from the stability test of the Kua'e Qianka Intelligent Computing Cluster. Training took a total of 13.2 days and supposedly ran perfectly well for the duration of the run with no breakdowns or interruptions. The AI ​​model used to benchmark the new computing cluster was the MT-infini-3B large language model.

(Image credit: cnBeta)

The new computer cluster is reportedly among the top AI GPU clusters of the same scale (probably using the same number of GPUs). However, the above table clearly lacks details. The MTT S4000 cluster was compared to unspecified Nvidia GPUs, for example — we don't know if they're A100, H100, or H200 GPUs, but we suspect the A100 is more likely. The workload is also not the same. Training for MT-infini-3B can be quite different from training for Llama3-3B, for example. In other words, sprinkle liberally with salt.

Even without apples-to-apples, however, training LLMs on MoreThreads GPUs represents an important step in China's domestic GPU roadmap. The Kua'e Qianka Computing Cluster at least suggests that the MTT S4000 AI GPUs are competitive with Nvidia's older generation A100 architecture. This is backed up by the S4000's raw performance numbers, which not only significantly outperform Moore Thread's S3000 and S2000 AI GPU predecessors, but also outperform Nvidia's Turing-based AI accelerator. The S4000 doesn't match Nvidia's A100 AI GPU accelerator, but it's probably not far off Ampere performance levels.

For Moore Threads, Kua'e Qianka's performance potential is a huge win, regardless of whether Nvidia GPUs or LLMs were tested. This shows that Moore Threads is now able to create AI GPUs that can perform the same tasks as AI GPU competitors from Nvidia, AMD and Intel. It may not perform well, but it's an important step toward faster and more capable supercomputers and AI clusters.

This is an impressive feat for a GPU manufacturer that was founded less than five years ago. If MoorThreads can continue to deliver significant improvements in generational performance, it could have an AI GPU accelerator that can go toe-to-toe with its Western counterparts in the next few years. That's a big “if” of course, and we know from historical precedent that GPU development doesn't always go as planned.

We'll also be interested to see if MoreThreads can put its apparently good AI performance capabilities into its gaming graphics cards. To date, MTT GPUs have suffered badly in gaming tests, thanks in part to immature drivers/optimization. Although AI requires a lot of computational power, it is different from real-time computer graphics, so expertise in one area does not necessarily imply similar abilities in another.

WhatsApp Group Join Now
Telegram Group Join Now
Instagram Group Join Now

Leave a Comment