RSS-Feed abonnieren

Enterprise AI has reached a turning point. For years, organizations have invested heavily in artificial intelligence, launching countless pilot projects and experimental models. Many AI projects show promise, but scaling these successes, integrating AI into core operations, and consistently extracting value across the enterprise remain significant challenges.

This is why an open, collaboratively-developed, standard AI operating system (AI OS) is vital. Built on open source technology, this system will provide the essential production and run time environment for customizing and running AI models. It enables a "train once, infer repeatedly" approach, helping shift AI from isolated experiments to widespread adoption.

To understand its potential impact, consider the Linux revolution. Decades ago, we built a standard version of Linux that worked with a wide array of hardware and applications, providing a reliable and flexible foundation that helped fuel innovation across industries. 

Building a standard AI OS will do the same, streamlining the deployment and management of AI, unlocking significant business value from current and future AI investments. This is more than a technological shift—it is a strategic necessity for both AI leaders and the AI industry as a whole.

Why AI projects struggle to scale

Despite the promise of AI, many enterprises encounter a significant production bottleneck. Moving from a successful proof of concept to a fully operational, scalable AI deployment is often difficult, hindering widespread adoption and limiting potential return on investment (ROI).

One of the primary hurdles is the "do-it-yourself" challenge. Organizations frequently find themselves building bespoke scaffolding and custom frameworks for AI inference. This results in fragmented, one-off solutions that are difficult to maintain, integrate, and scale across departments or use cases. Each new AI initiative tends to develop its own unique set of tools and processes, creating a complex and inefficient environment.

Adding to this issue is hardware and model fragmentation. The explosion of AI models and the rapid spread of specialized AI accelerators—from GPUs to custom application-specific integrated circuits (ASICs)—highlight a critical gap: a common, efficient execution layer. This heterogeneity creates significant operational complexity, making it difficult to optimize performance and interoperability.

Finally, inefficient resource use remains a persistent problem. Expensive AI hardware, particularly GPUs, often sits idle and unused. To get full value from these investments, an AI OS must be able to dynamically allocate and deallocate resources, optimize workloads, and maximize throughput. Without such a system, the economic viability of large-scale AI deployments is severely restricted.

What is the AI OS?

The AI OS, in this context, is not a new operating system built from scratch. Instead, it is an emergent standardized AI layer built on existing, robust infrastructure and technologies. It provides a common platform for managing and optimizing AI inference workloads at scale. Its goal is to abstract away much of the underlying complexity, providing a unified environment for deploying and running AI models in production. It builds on well-established open source technologies, including:

Kubernetes: Orchestrating distributed AI

Enterprises already rely on Kubernetes for orchestrating production applications. It also offers the scalability, security, provisioning, and multi-tenancy needed to manage complex and dynamic AI environments. In the AI OS, Kubernetes acts as the control plane, deploying AI workloads efficiently and reliably across distributed infrastructure.

vLLM: The kernel of AI inference

Within the AI OS, vLLM serves as the core runtime, enabling the support of the leading large language models (LLMs) and ensuring their efficient performance under demanding workloads. It enables running optimized models across heterogeneous accelerators, addressing the fragmentation challenge by providing a high-performance, unified execution layer for complex inference tasks.

llm-d: distributed inference at scale

The standard AI OS builds on the llm-d open source project, incorporating key technical innovations to enable production-grade AI at scale.

Distributed inference capabilities go far beyond simple model replication. The AI OS enables a single model to run efficiently across multiple GPUs and servers. This allows for horizontal scaling of individual models, improving throughput and resilience for high-demand applications.

Another critical component for scaling LLM inference is the distributed key-value (KV) cache. This innovation offers increased flexibility, improves service level objectives (SLOs), and delivers more tokens per unit of infrastructure. By intelligently managing the KV cache across distributed resources, the AI OS can significantly boost the efficiency and responsiveness of LLM deployments.

Intelligent routing and scheduling optimize inference request placement, going beyond simple least-load balancing. It considers the state of the KV cache when routing requests, working to direct inference tasks to the most appropriate and efficient resources, improving resource utilization and reducing latency.

Beyond traditional optimization methods, performance optimization within the AI OS uses advanced quantization techniques. These are specifically mapped to particular hardware generations and use efficient "kernels" for running these optimizations. This hardware-aware approach enables models to run with optimal speed and efficiency on the latest AI accelerators.

Shaping the future of AI workloads

The emergence of the standard AI OS is not just about optimizing current AI deployments. It is foundational to the evolution of AI workloads and their increasing complexity.

It enables agentic AI workflows. As enterprises orchestrate multiple AI models to work collaboratively on complex tasks, the AI OS becomes essential as it enables robust scheduling, efficient resource sharing, and distributed infrastructure.

The AI OS is also crucial for addressing inference-time scaling. As AI moves from solely data-driven model improvement to more complex inference-time reasoning, the computational burden on the platform increases. The AI OS delivers the performance and optimized resource utilization needed to make these computationally intensive reasoning models economically viable.

The power of open source and ecosystem collaboration

Open source principles and broad ecosystem collaboration will significantly accelerate the development and widespread adoption of a standard AI OS. This aligns with the "coopetition" model used to build Linux and its ecosystem, in which organizations work to build foundational technology in open, collaborative communities, then compete commercially on various solutions. This approach drives innovation and establishes common standards.

Cross-industry engagement is also crucial. Developing a standard AI OS requires broad involvement from hardware vendors, model providers, server manufacturers, and AI platform developers. This collaborative environment promotes interoperability, helps prevent vendor lock-in, and fosters a rich ecosystem of compatible technologies.

A common, open, standard AI OS will help individual enterprises avoid repeatedly "re-inventing the wheel." This will help accelerate AI's impact across industries, enabling organizations to focus on building unique business value on top of a standardized, high-performance AI infrastructure.

Red Hat's role

We believe that Red Hat plays a pivotal role in the development of the standard AI OS, building foundational components for scalable, production-grade AI through our open source, hybrid cloud, and enterprise infrastructure expertise. All with the guiding principle of supporting any model, any accelerator, any cloud.

Red Hat AI forms a core part of this vision. It offers integrated tools and runtimes necessary for building, deploying, and managing AI models—from training to inference—across hybrid environments.

  • Red Hat AI Inference Server helps organizations efficiently deploy and scale AI models across their hybrid cloud infrastructure. It offers a high-performance, unified platform for running AI inference on various hardware, from the data center to the edge. It includes a hardened vLLM serving engine, intelligent LLM compression tools, and an optimized model repository, all designed to accelerate AI adoption and improve operational efficiency.

  • Red Hat Enterprise Linux AI (RHEL AI) is a foundation model platform for LLM development, testing, and deployment with optimized inference capabilities. It brings together InstructLab model alignment tools, a bootable image of RHEL that includes popular AI libraries, and hardware-optimized inference for various accelerators.

  • Red Hat OpenShift AI offers a unified AI/ML platform, providing a comprehensive environment for building, deploying, and managing AI models, including LLMs and MLOps pipelines. This platform optimizes hardware utilization, maximizing the return on expensive AI infrastructure.

Beyond these, our hybrid cloud strategy enables flexible AI deployments that simplify data sovereignty and improve an organization's security posture. Enterprises can deploy AI models where their data resides, meeting compliance requirements. They can also leverage built-in enterprise-grade security and governance features across on-premise, public cloud, and edge environments.

Red Hat also empowers the AI workforce. We help address the AI talent gap through expert consulting, co-creation services, and comprehensive training and certification programs. This helps organizations build the skills they need to leverage advanced AI technologies both now and in the future.

Finally, Red Hat collaborates with a wide variety of hardware vendors and technology partners, including Google, IBM Research, NVIDIA, AMD, Intel, Hugging Face and others. This fosters an open, integrated, and vendor-neutral AI ecosystem, helping drive innovation and prevent the "DIY chaos" that often plagues early-stage technology adoption.

The path forward for enterprise AI leaders

The standard AI OS is more than a technical evolution—it is a strategic necessity for IT leaders aiming to harness AI's full potential. By standardizing the runtime platform for AI models, the AI OS will help unlock unprecedented levels of efficiency, scalability, and innovation. Open source principles and community collaboration will accelerate this future, helping provide a robust, flexible foundation for scalable, efficient, and transformative AI deployments.

Learn more in this episode of Technically Speaking: Scaling AI inference with open source.

resource

Erste Schritte mit KI für Unternehmen: Ein Guide für den Einsatz

In diesem Guide für den Einstieg erfahren Sie, wie Red Hat OpenShift AI und Red Hat Enterprise Linux AI die KI-Einführung beschleunigen können.

Über den Autor

UI_Icon-Red_Hat-Close-A-Black-RGB

Nach Thema durchsuchen

automation icon

Automatisierung

Das Neueste zum Thema IT-Automatisierung für Technologien, Teams und Umgebungen

AI icon

Künstliche Intelligenz

Erfahren Sie das Neueste von den Plattformen, die es Kunden ermöglichen, KI-Workloads beliebig auszuführen

open hybrid cloud icon

Open Hybrid Cloud

Erfahren Sie, wie wir eine flexiblere Zukunft mit Hybrid Clouds schaffen.

security icon

Sicherheit

Erfahren Sie, wie wir Risiken in verschiedenen Umgebungen und Technologien reduzieren

edge icon

Edge Computing

Erfahren Sie das Neueste von den Plattformen, die die Operations am Edge vereinfachen

Infrastructure icon

Infrastruktur

Erfahren Sie das Neueste von der weltweit führenden Linux-Plattform für Unternehmen

application development icon

Anwendungen

Entdecken Sie unsere Lösungen für komplexe Herausforderungen bei Anwendungen

Virtualization icon

Virtualisierung

Erfahren Sie das Neueste über die Virtualisierung von Workloads in Cloud- oder On-Premise-Umgebungen