订阅内容

On April 2, 2025, industry-standard MLPerf Inference v5.0 datacenter results were published by MLCommons. Red Hat and Supermicro submitted strong results for the popular llama2 70B model with Red Hat OpenShift running on their dual GPU GH200 Grace Hopper Superchip 144GB server. This was the first time anyone has submitted an MLPerf result with OpenShift on GH200. You can view these results at mlcommons.org.

MLPerf Inference v5.0 published results are available at mlcommons.org. Choose the open division from the menu to see these Supermicro results.

Llama2-70b 

Meta released the Llama2-70b model on July 18, 2023. This model is open source and part of the very popular Llama family of models that range from 7 billion to 70 billion parameters. In this round of MLPerf Inference Datacenter there were 17 organizations who submitted Llama 2 70B results, making it the most popular model in this round. 

The Supermicro MLPerf v5.0 dual GPU GH200 server submission ran OpenShift 4.15 and NVIDIA TRT-LLM for the server stack. TRT-LLM uses post-training quantization to quantize llama2-70b to FP8 precision (8-bit floating point). FP8 dramatically reduces the memory footprint and bandwidth requirements allowing larger batch sizes and longer sequences. FP8 quantization also allows faster computation, but is less precise. This quantized model was used in the Supermicro MLPerf v5.0 submission and takes advantage of the FP8 hardware in GH200 systems. 

In this submission, OpenShift 4.15 and TRT-LLM were used together and the system under test was is shown here:

The hardware and software stack used in Supermicro’s MLPerf submission with OpenShift on GH200 144GB.

In addition to the excellent results submitted for MLPerf Inference v5.0 we were able to demonstrate on the Supermicro GH200 dual GPU system that OpenShift results were comparable to bare metal Red Hat Enterprise Linux (RHEL) 9.4 results. The chart below shows that OpenShift added less than 2% overhead in these four scenarios for Llama2-70b. 

Tokens per second was nearly identical, within a few percentage points, on GH200 144GB with RHEL 9.4 vs GH200 144GB with OpenShift. Openshift did not add significant overhead.

System details

ARS-111GL-NHR ( 144 GB GPU)

Supermicro GH200 144 GB System Details. NVIDIA Certified logo

Key applications 

  • High Performance Computing
  • AI/Deep Learning Training and Inference
  • Large Language Model(LLM) and Generative AI  

Key features 

This system currently supports two E1.S drives direct to the processor and the onboard GPU only. Please consult your Supermicro salesperson for details. 

  1. High density 1U GPU system with Integrated NVIDIA® H100 GPUa
  2. NVIDIA Grace Hopper™ Superchip (Grace CPU and H100 GPU)
  3. NVLink® Chip-2-Chip (C2C) high-bandwidth, low-latency interconnect between CPU and GPU at 900GB/s
  4. Up to 576GB of coherent memory per node including 480GB LPDDR5X and 96GB of HBM3 for LLM applications.
  5. 2x PCIe 5.0 X16 slots supporting NVIDIA BlueField®-3 or ConnectX®-7
  6. 9 hot-swap heavy duty fans with optimal fan speed control
  7. This system supports two E1.S drives directly from the processor only

Easy to use 

It is easy to run the MLPerf benchmarks with OpenShift. Operators can be selected and installed from the Operator hub. These operators automate the management of everything from storage to the NVIDIA accelerated compute resources. 

The OpenShift Storage Operator makes it easier to automate the process of running multiple benchmarks. It allows you to create what is essentially a repository for models and then easily switch between them in your pod manifest. We were able to load multiple models into storage and easily switch between them. The Storage Operator automatically provisioned storage for these models when persistent volume claims (PVCs) were created. 

The NVIDIA GPU Operator makes it easier to install required NVIDIA drivers, container runtimes and other libraries used to access the NVIDIA GPUs on OpenShift. 

Conclusion 

Supermicro and Red Hat have demonstrated competitive performance for the MLPerf Inference v5.0 benchmarks with Llama-2-70b. The results were essentially the same on RHEL 9.4 and OpenShift 4.15 showing that OpenShift adds both usability and monitoring capabilities without sacrificing performance. 

resource

开启企业 AI 之旅:新手指南

此新手指南介绍了红帽 OpenShift AI 和红帽企业 Linux AI 如何加快您的 AI 采用之旅。

关于作者

Diane Feddema is a Principal Software Engineer at Red Hat Inc in the Performance and Scale Team with a focus on AI/ML applications.  She has submitted official results in multiple rounds of MLCommons MLPerf Inference and Training, dating back to the initial MLPerf rounds.  Diane Leads performance analysis and visualization for MLPerf benchmark submissions and collaborates with Red Hat Hardware Partners in creating joint MLPerf benchmark submissions.

Diane has a BS and MS in Computer Science and is presently co-chair of the Best Practices group of the MLPerf consortium.

Read full bio

Nikola Nikolov is AI/HPC solutions engineer from Supermicro. Nikola received PhD in Nuclear Physics from the University of Knoxville, Tennessee focused on large-scale HPC computations in Nuclear Astrophysics at Oak Ridge National Laboratory under National Nuclear Security Administration (NNSA) Stewardship grant. 

Before joining the industry, he spent last years in academics designing experiments with CERN Isolde collaboration and Cosmic Neutrino Detection with Los Alamos National Laboratory.

Prior to Supermicro, Nikola worked at KLA+ Inc. (former KLA-Tencor) as Big Data and ML developer in semiconductor industry. He designed HBase, Big-Table, and Data Lake infrastructures for Anomaly Detection and Failure Predictive analysis of semiconductor equipment. These Big-Data systems have been implemented successfully by major chip manufacturing companies like TSMC, Samsung, and SK Hynix. 

Nikola has published both Peer-Reviewed academic articles in top scientific journals like Physical Review Letters and Nature, as well as engineering papers in Big Data management.

In the last 8 years he have focused mainly on public and hybrid cloud solutions with AWS and Google Cloud Platform. In Supermciro, Nikola works mostly into designing cutting edge AI/HPC infrastructure solutions as well as validating AI/HPC systems via MLPerf and HPC benchmarking.

Read full bio

Arpitha Srinivas is an AI Systems Performance Engineer at Supermicro, specializing in optimizing large-scale AI models such as Llama 4, Llama 3.1, Llama 2-70B, BERT, Stable Diffusion, ResNet, and 3D-Unet across NVIDIA (B300, B200, H200, GH200, H100), AMD (MI350, MI325X, MI300X), and Intel (Xeon) platforms. With a master’s degree in AI Software Engineering from San José State University, she brings deep expertise in TensorRT, vLLM, CUDA, ROCm, and MLPerf benchmarking. She has successfully submitted multiple rounds of MLCommons MLPerf Inference and Training benchmarks. Her research paper, "Cyber-Security Dashboard: An Extensible Intrusion Detection System for Distributed Control Systems," was accepted at the IEEE SVCC 2025 Conference. Arpitha is a passionate advocate for women in technology and has contributed to several industry conferences. She also reviews for O’Reilly Media books. When not tuning AI systems and models she enjoys hiking, music, and painting.

Read full bio
UI_Icon-Red_Hat-Close-A-Black-RGB

按频道浏览

automation icon

自动化

有关技术、团队和环境 IT 自动化的最新信息

AI icon

人工智能

平台更新使客户可以在任何地方运行人工智能工作负载

open hybrid cloud icon

开放混合云

了解我们如何利用混合云构建更灵活的未来

security icon

安全防护

有关我们如何跨环境和技术减少风险的最新信息

edge icon

边缘计算

简化边缘运维的平台更新

Infrastructure icon

基础架构

全球领先企业 Linux 平台的最新动态

application development icon

应用领域

我们针对最严峻的应用挑战的解决方案

Virtualization icon

虚拟化

适用于您的本地或跨云工作负载的企业虚拟化的未来