Introducing A3 supercomputers with NVIDIA H100 GPUs

Implementing state-of-the-art artificial intelligence (AI) and machine learning (ML) models requires large amounts of computation, both to train the underlying models, and to serve those models once they’re trained. Given the demands of these workloads, a one-size-fits-all approach is not enough — you need infrastructure that’s purpose-built for AI.

Together with our partners, we offer a wide range of compute options for ML use cases such as large language models (LLMs), generative AI, and diffusion models. Recently, we announced G2 VMs, becoming the first cloud to offer the new NVIDIA L4 Tensor Core GPUs for serving generative AI workloads. Today, we’re expanding that portfolio with the private preview launch of the next-generation A3 GPU supercomputer. Google Cloud now offers a complete range of GPU options for training and inference of ML models.

Google Compute Engine A3 supercomputers are purpose-built to train and serve the most demanding AI models that power today’s generative AI and large language model innovation. Our A3 VMs combine NVIDIA H100 Tensor Core GPUs and Google’s leading networking advancements to serve customers of all sizes:

A3 is the first GPU instance to use our custom-designed 200 Gbps IPUs, with GPU-to-GPU data transfers bypassing the CPU host and flowing over separate interfaces from other VM networks and data traffic. This enables up to 10x more network bandwidth compared to our A2 VMs, with low tail latencies and high bandwidth stability.
Our industry-unique intelligent Jupiter data center networking fabric scales to tens of thousands of highly interconnected GPUs and allows for full-bandwidth reconfigurable optical links that can adjust the topology on demand. For almost every workload structure, we achieve workload bandwidth that is indistinguishable from more expensive off-the-shelf non-blocking network fabrics, resulting in a lower TCO.
The A3 supercomputer’s scale provides up to 26 exaFlops of AI performance, which considerably improves the time and costs for training large ML models.
See Also
India's AI supercomputer ‘AIRAWAT’ makes it to the list of world's '100 most powerful' - Times of India AI Supercomputer ‘AIRAWAT’ puts India among top supercomputing league IIT-Guwahati gets northeast’s fastest supercomputer Fastest Supercomputer in India? It’s Easy If You Do It Smart

As companies transition from training to serving their ML models, A3 VMs are also a strong fit for inference workloads, seeing up to a 30x inference performance boost when compared to our A2 VM’s that are powered by NVIDIA A100 Tensor Core GPU*.

Purpose-built for performance and scale

A3 GPU VMs were purpose-built to deliver the highest-performance training for today’s ML workloads, complete with modern CPU, improved host memory, next-generation NVIDIA GPUs and major network upgrades. Here are the key features of the A3:

8 H100 GPUs utilizing NVIDIA’s Hopper architecture, delivering 3x compute throughput
3.6 TB/s bisectional bandwidth between A3’s 8 GPUs via NVIDIA NVSwitch and NVLink 4.0
Next-generation 4th Gen Intel Xeon Scalable processors
2TB of host memory via 4800 MHz DDR5 DIMMs
10x greater networking bandwidth powered by our hardware-enabled IPUs, specialized inter-server GPU communication stack and NCCL optimizations

A3 GPU VMs are a step forward for customers developing the most advanced ML models. By considerably speeding up the training and inference of ML models, A3 VMs enable businesses to train more complex ML models at a fast speed, creating an opportunity for our customer to build large language models (LLMs), generative AI, and diffusion models to help optimize operations and stay ahead of the competition.

Fully-managed AI infrastructure optimized for performance and cost

For customers looking to develop complex ML models without the maintenance, you can deploy A3 VMs on Vertex AI, an end-to-end platform for building ML models on fully-managed infrastructure that’s purpose-built for low-latency serving and high-performance training. Today, at Google I/O 2023, we’re pleased to build on these offerings by both opening generative AI support in Vertex AI to more customers, and by introducing new features and foundation models.

For customers looking to architect their own custom software stack, customers can also deploy A3 VMs on Google Kubernetes Engine (GKE) and Compute Engine, so that you can train and serve the latest foundation models, while enjoying support for autoscaling, workload orchestration, and automatic upgrades.

“Google Cloud's A3 VM instances provide us with the computational power and scale for our most demanding training and inference workloads. We're looking forward to taking advantage of their expertise in the AI space and leadership in large-scale infrastructure to deliver a strong platform for our ML workloads.” -Noam Shazeer, CEO, Character.AI

At Google Cloud, AI is in our DNA. We’ve applied decades of experience running global scale computing for AI. We designed that infrastructure to scale and be optimized for running a wide variety of AI workloads — and now, we’re making it available to you. To join the Preview waitlist for the A3, please register with this link.

^{*Data source: https://www.nvidia.com/en-us/data-center/h100/}

Posted in

As a seasoned expert in the field of artificial intelligence and machine learning, my extensive experience and in-depth knowledge allow me to provide valuable insights into the concepts presented in the article.

Firstly, the article addresses the critical need for specialized infrastructure to implement state-of-the-art AI and ML models. This resonates with the growing understanding in the industry that generic solutions are inadequate to meet the computational demands of training and serving sophisticated models. It emphasizes the importance of purpose-built infrastructure, a sentiment that aligns seamlessly with the current trends in AI development.

The introduction of G2 VMs, featuring the new NVIDIA L4 Tensor Core GPUs, and the subsequent launch of the A3 GPU supercomputer by Google Cloud signify a significant leap in AI infrastructure. The incorporation of cutting-edge technologies, such as the H100 Tensor Core GPUs, underscores a commitment to pushing the boundaries of performance in both training and inference for ML models.

One of the standout features of the A3 VMs is the use of custom-designed 200 Gbps IPUs, a testament to Google Cloud's dedication to optimizing networking for AI workloads. The article provides evidence supporting the efficiency of this design, citing up to 10x more network bandwidth compared to the previous A2 VMs. The focus on low tail latencies and high bandwidth stability further emphasizes the meticulous attention to detail in addressing the specific needs of AI applications.

The intelligent Jupiter data center networking fabric introduces a unique approach, allowing for the scalable interconnection of tens of thousands of GPUs. The use of reconfigurable optical links that can adjust the topology on demand is highlighted as a groundbreaking feature. The article provides evidence of comparable workload bandwidth to more expensive off-the-shelf non-blocking network fabrics, resulting in a lower Total Cost of Ownership (TCO).

The scale of the A3 supercomputer is presented as a substantial advancement, providing up to 26 exaFlops of AI performance. This metric is crucial in emphasizing the tangible benefits in terms of time and cost efficiency for training large ML models.

The A3 GPU VMs are positioned as purpose-built for performance and scale, combining elements such as the H100 GPUs, modern CPUs, improved host memory, and major network upgrades. The detailed specifications, including 8 H100 GPUs, bisectional bandwidth, and the use of NVIDIA NVSwitch and NVLink 4.0, reinforce the emphasis on delivering the highest-performance training for today's ML workloads.

The article concludes by highlighting the significance of A3 VMs in accelerating the training and inference of ML models. It emphasizes the potential for businesses to leverage this technology to develop advanced models, including large language models (LLMs), generative AI, and diffusion models, providing a competitive edge.

In summary, the article showcases Google Cloud's commitment to advancing AI infrastructure, providing evidence of their expertise through detailed specifications, technological innovations, and real-world performance improvements in AI and machine learning workloads.

Introducing A3 supercomputers with NVIDIA H100 GPUs | Google Cloud Blog (2024)

Purpose-built for performance and scale

Fully-managed AI infrastructure optimized for performance and cost