Home / Hardware

Photo of solar panels, processor chip, code
Image: via wafer.ai
Hardware

AMD MI355X Bridges Inference Gap with Wafer AI's GLM5.2

WireByte Staff · July 4, 2026

Wafer AI's GLM5.2 inference model achieves 2626 tokens per second per node on AMD MI355X, outperforming NVIDIA Blackwell at over 2x lower cost. This breakthrough addresses the growing demand for inference, which has outpaced supply due to frequent model releases.

Key points

  • Wafer AI's GLM5.2 model was run on AMD's MI355X GPU at 2626 tokens per second per node.
  • The MI355X offers over 2x lower cost compared to NVIDIA's Blackwell GPU.
  • The demand for inference is skyrocketing, outpacing supply due to frequent model releases.
  • AMD's Instinct MI350 series competes with Blackwells at the silicon level, but lacks day-0 support, leading to performance gaps.
  • Wafer AI has proven to close the gap through kernel and model optimization, making AMD a viable alternative.

The rapid development of large language models has led to a surge in demand for inference, a process that requires significant computational resources. However, the supply of high-performance GPUs, such as NVIDIA's Blackwell, has failed to keep pace. This has resulted in rising costs and longer wait times for users.

Wafer AI's recent achievement on AMD's MI355X GPU offers a potential solution to this problem. The MI355X, which is available at over 2x lower cost than Blackwell, has been able to run Wafer AI's GLM5.2 model at 2626 tokens per second per node. This breakthrough has significant implications for the industry, as it addresses the growing demand for inference and provides a more affordable alternative to NVIDIA's high-end GPUs.

While AMD's Instinct MI350 series competes with Blackwells at the silicon level, the lack of day-0 support has historically led to performance gaps. However, Wafer AI has proven that through kernel and model optimization, these gaps can be closed in real-time. This makes AMD a viable alternative for users who require high-performance inference capabilities without the high costs associated with NVIDIA's high-end GPUs.

Sources

WireByte Staff — Editorial Team

The WireByte editorial team synthesises technology news from multiple primary sources, verifies the facts, and links every source. Articles are produced with AI assistance and reviewed under our editorial policy.