Red Hat and NVIDIA have achieved industry-leading results in the latest MLPerf Inference v6.0 benchmarks for vision, speech, and reasoning models. The companies optimized layers from the RHEL kernel to the vLLM engine. This work aims to help enterprises reduce costs per token on H200 and B200 GPUs.
Red Hat announced on April 2 that it collaborated with NVIDIA to deliver top performance in the MLPerf Inference v6.0 benchmarks. The results cover vision, speech, and reasoning models, positioning them as industry leaders in these categories. Red Hat stated that optimizations spanned every layer, starting from the RHEL kernel up to the vLLM engine. These improvements target lower cost per token for enterprises using H200 and B200 GPUs from NVIDIA. Red Hat invited viewers to review the benchmark data for details.