AI Hypercomputer logo

Train, tune, and serve on an AI supercomputer

AI Hypercomputer is the integrated supercomputing system underneath every AI workload on Google Cloud. It is made up of hardware, software and consumption models designed to simplify AI deployment, improve system-level efficiency, and optimize costs.

Overview

AI-optimized hardware

Choose from compute, storage, and networking options optimized for granular, workload-level objectives, whether that's higher throughput, lower latency, faster time-to-results, or lower TCO. Learn more about: Google Cloud TPU, Google Cloud GPU, plus the latest in storage and networking.

Leading software, open frameworks

Get more from your hardware with industry-leading software, integrated with open frameworks, libraries, and compilers to make AI development, integration, and management more efficient.

Flexible consumption models

Flexible consumption options allow customers to choose fixed costs with committed use discounts or dynamic on-demand models to meet your business needs. Dynamic Workload Scheduler and Spot VMs can help you get the capacity you need without over allocating. Plus, Google Cloud's cost optimization tools help automate resource utilization to reduce manual tasks for engineers.

How It Works

Google is a leader in artificial intelligence with the invention of technologies like TensorFlow. Did you know you can leverage Google’s technology for your own projects? Learn about Google's history of innovation in AI infrastructure and how you can leverage it for your workloads.

Google Cloud AI Hypercomputer architecture diagram alongside the Google Cloud product manager Chelsie's photo

Common Uses

Run large-scale AI training and pre-training

Powerful, scalable, and efficient AI training

Training workloads need to run as highly synchronized jobs across thousands of nodes in tightly coupled clusters. A single degraded node can disrupt an entire job, delaying time-to-market. You need to:

  • Ensure the cluster is set up quickly and tuned for the workload in question
  • Predict failures and troubleshoot them quickly
  • And continue with a workload, even when failures do happen

We want to make it extremely easy for customers to deploy and scale training workloads on Google Cloud.

    Powerful, scalable, and efficient AI training

    Training workloads need to run as highly synchronized jobs across thousands of nodes in tightly coupled clusters. A single degraded node can disrupt an entire job, delaying time-to-market. You need to:

    • Ensure the cluster is set up quickly and tuned for the workload in question
    • Predict failures and troubleshoot them quickly
    • And continue with a workload, even when failures do happen

    We want to make it extremely easy for customers to deploy and scale training workloads on Google Cloud.

      Powerful, scalable, and efficient AI training

      To create an AI cluster, get started with one of our turorials:

      Character AI leverages Google Cloud to scale up

      "We need GPUs to generate responses to users' messages. And as we get more users on our platform, we need more GPUs to serve them. So on Google Cloud, we can experiment to find what is the right platform for a particular workload. It's great to have that flexibility to choose which solutions are most valuable." Myle Ott, Founding Engineer, Character.AI

      Deploy and orchestrate AI applications

      Leverage leading AI orchestration software and open frameworks to deliver AI powered experiences

      Google Cloud provides images that contain common operating systems, frameworks, libraries, and drivers. AI Hypercomputer optimizes these pre-configured images to support your AI workloads.

      • AI and ML frameworks and libraries: Use Deep Learning Software Layer (DLSL) Docker images to run ML models such as NeMO and MaxText on a Google Kubernetes Engine (GKE) cluster.
      • Cluster deployment and AI orchestration: You can deploy your AI workloads on GKE clusters, Slurm clusters, or Compute Engine instances. For more information, see VM and cluster creation overview.

      Leverage leading AI orchestration software and open frameworks to deliver AI powered experiences

      Google Cloud provides images that contain common operating systems, frameworks, libraries, and drivers. AI Hypercomputer optimizes these pre-configured images to support your AI workloads.

      • AI and ML frameworks and libraries: Use Deep Learning Software Layer (DLSL) Docker images to run ML models such as NeMO and MaxText on a Google Kubernetes Engine (GKE) cluster.
      • Cluster deployment and AI orchestration: You can deploy your AI workloads on GKE clusters, Slurm clusters, or Compute Engine instances. For more information, see VM and cluster creation overview.

      Explore software resources

      Priceline: Helping travelers curate unique experiences

      "Working with Google Cloud to incorporate generative AI allows us to create a bespoke travel concierge within our chatbot. We want our customers to go beyond planning a trip and help them curate their unique travel experience." Martin Brodbeck, CTO, Priceline

      priceline logo

      Cost-effectively serve models at scale

      Maximize price-performance and reliability for inference workloads

      Inference is quickly becoming more diverse and complex, evolving in three main areas:

      • First, how we interact with AI is changing. Conversations now have much longer and more diverse context.
      • Second, sophisticated reasoning and multi-step inference are making Mixture-of-Experts (MoE) models more common. This is redefining how memory and compute scale from initial input to final output.
      • Finally, it's clear that the real value isn't just about raw tokens per dollar, but the usefulness of the response. Does the model have the right expertise? Did it answer a critical business question correctly? That's why we believe customers need better measurements, focusing on the total cost of system operations, not the price of their processors.

      Maximize price-performance and reliability for inference workloads

      Inference is quickly becoming more diverse and complex, evolving in three main areas:

      • First, how we interact with AI is changing. Conversations now have much longer and more diverse context.
      • Second, sophisticated reasoning and multi-step inference are making Mixture-of-Experts (MoE) models more common. This is redefining how memory and compute scale from initial input to final output.
      • Finally, it's clear that the real value isn't just about raw tokens per dollar, but the usefulness of the response. Does the model have the right expertise? Did it answer a critical business question correctly? That's why we believe customers need better measurements, focusing on the total cost of system operations, not the price of their processors.

      Explore AI Inference resources

        Assembly AI leverage Google Cloud for cost efficiency

        "Our experimental results show that Cloud TPU v5e is the most cost-efficient accelerator on which to run large-scale inference for our model. It delivers 2.7x greater performance per dollar than G2 and 4.2x greater performance per dollar than A2 instances." Domenic Donato,

        VP of Technology, AssemblyAI


        AssemblyAI logo
        Generate a solution
        What problem are you trying to solve?
        What you'll get:
        Step-by-step guide
        Reference architecture
        Available pre-built solutions
        This service was built with Vertex AI. You must be 18 or older to use it. Do not enter sensitive, confidential, or personal info.

        Open source models on Google Cloud

        Serve a model with GKE on a single GPU

        Train common models with GPUs

        Scale model serving to multiple GPUs

        Serve an LLM using multi-host TPUs on GKE with Saxml

        Train at scale with the NVIDIA Nemo framework

        FAQ

        Is AI Hypercomputer the easiest way to get started with AI Workloads on Google Cloud?

        For most customers, a managed AI platform like Vertex AI is the easiest way to get started with AI because it has all of the tools, templates, and models baked in. Plus, Vertex AI is powered by AI Hypercomputer under the hood in a way that is optimized on your behalf. Vertex AI is the easiest way to get started because it’s the simplest experience. If you prefer to configure and optimize every component of your infrastructure, you can access AI Hypercomputer’s components as Infrastructure and assemble it in a way that meets your needs.

        While individual services offer specific capabilities, AI Hypercomputer provides an integrated system where hardware, software, and consumption models are designed to work optimally together. This integration delivers system-level efficiencies in performance, cost, and time-to-market that are harder to achieve by stitching together disparate services. It simplifies complexity and provides a holistic approach to AI infrastructure.



        Yes, AI Hypercomputer is designed with flexibility in mind. Technologies like Cross-Cloud Interconnect provide high-bandwidth connectivity to on-premises data centers and other clouds, facilitating hybrid and multi-cloud AI strategies. We operate with open standards and integrate popular third-party software to enable you to build solutions that span multiple environments, and change services as you please.

        Security is a core aspect of AI Hypercomputer. It benefits from Google Cloud’s multi-layered security model. Specific features include Titan security microcontrollers (ensuring systems boot from a trusted state), RDMA Firewall (for zero-trust networking between TPUs/GPUs during training), and integration with solutions like Model Armor for AI safety. These are complemented by robust infrastructure security policies and principles like the Secure AI Framework.

        • If you don’t want to manage VMs, we recommend starting with Google Kubernetes Engine (GKE)
        • If you need to use multiple schedulers, or can’t use GKE, we recommend using Cluster Director
        • If you want complete control over your infrastructure, the only way to achieve that is by working directly with VMs, and for that - Google Compute Engine is your best option


        No. AI Hypercomputer can be used for any sized workload. Smaller sized workloads still realize all the benefits of an integrated system, such as efficiency and simplified deployment. AI Hypercomputer also supports customers as their businesses scale, from small proof-of-concepts and experiments to large scale production deployments.

        Yes, we are building a library of recipes in Github. You can also use the Cluster Toolkit for pre-built cluster blueprints.

        AI-optimized hardware

        Storage

        • Training: Managed Lustre is ideal for demanding AI training with high throughput and PB-scale capacity. GCS Fuse (optionally with Anywhere Cache) suits larger capacity needs with more relaxed latency. Both integrate with GKE and Cluster Director.
        • Inference: GCS Fuse with Anywhere Cache offers a simple solution. For higher performance, consider Hyperdisk ML. If using Managed Lustre for training in the same zone, it can also be used for inference.

        Networking

        • Training: Benefit from technologies like RDMA networking in VPCs, and high-bandwidth Cloud and Cross-Cloud Interconnect for rapid data transfer.
        • Inference: Utilize solutions like the GKE Inference Gateway and enhanced Cloud Load Balancing for low-latency serving. Model Armor can be integrated for AI safety and security.

        Compute: Access Google Cloud TPUs (Trillium), NVIDIA GPUs (Blackwell), and CPUs (Axion). This allows for optimization based on specific workload needs for throughput, latency, or TCO.

        Leading software and open frameworks

        • ML Frameworks and Libraries: PyTorch, JAX, TensorFlow, Keras, vLLM, JetStream, MaxText, LangChain, Hugging Face, NVIDIA (CUDA, NeMo, Triton), and many more open source and third party options.
        • Compilers, Runtimes and Tools: XLA (for performance and interoperability), Pathways on Cloud, Multislice Training, Cluster Toolkit (for pre-built cluster blueprints), and many more open source and third party options.
        • Orchestration: Google Kubernetes Engine (GKE), Cluster Director (for Slurm, non-managed Kubernetes, BYO schedulers), and Google Compute Engine (GCE).

        Consumption models:

        • On Demand: Pay-as-you-go.
        • Committed Use Discounts (CUDs): Save significantly (up to 70%) for long-term commitments.
        • Spot VMs: Ideal for fault-tolerant batch jobs, offering deep discounts (up to 91%).
        • Dynamic Workload Scheduler (DWS): Save up to 50% for batch/fault tolerant jobs.
        Google Cloud