Chinese GPU developer Moore Threads has introduced its Huagang architecture, promising significant advances in gaming and AI performance. Set for a 2026 launch, the design targets self-reliance in semiconductors amid global export curbs. While details remain sparse, the company highlighted ambitious benchmarks for upcoming products.
At the recent MUSA Developer Conference, Moore Threads presented its next-generation Huagang architecture, dubbed "Flowerpot" in some translations. This platform aims to power both gaming and artificial intelligence applications, with a full rollout planned for 2026. The announcement focused on performance projections rather than in-depth technical breakdowns, underscoring China's efforts to build domestic GPU capabilities in the face of international restrictions.
Central to the reveal is the Lushan gaming GPU, which will replace the existing MTT S80 and S90 models. Moore Threads asserts that Lushan will deliver a 15-fold increase in AAA game rendering speed and a 50-fold enhancement in ray tracing capabilities. It incorporates a second-generation hardware ray tracing engine and complete DirectX 12 Ultimate compatibility for improved software integration. Memory capacity is expected to reach 64 GB, a quadrupling from the current 16 GB GDDR6 in prior models. Additional touted gains include 64 times faster AI computations, 16 times better geometry processing, four times higher texture fill rates, and eight times quicker atomic memory operations. The architecture introduces UniTE, a unified rendering system with an integrated AI processing unit.
Complementing this, the Huashan AI GPU features a dual-chiplet configuration equipped with nine HBM modules. The firm claims its performance rivals Nvidia's Hopper and Blackwell series, with memory bandwidth surpassing that of the Nvidia B200. Huashan supports a range of precision formats from FP4 to FP64, including proprietary MTFP4, MTFP6, and MTFP8 options. Scalability extends to clusters exceeding 100,000 units through MTLink 4.0, offering 1,314 GB/s interconnect speed. Compared to current offerings, it promises a 50 percent rise in compute density and tenfold efficiency improvements.
Although no gaming demonstrations were shown, a benchmark on the forthcoming MTT S5000 GPU—unrelated to Huashan—ran the DeepSeek V3 model at 1,000 tokens per second in decoding and 4,000 in prefill phases, edging out Nvidia's Hopper performance. These developments reflect Beijing's drive toward technological independence, though the claims await validation as products near market.