Nvidia unveils next-gen accelerated computing platform

New Products |
Graphics processing chipmaker Nvidia has announced its next-generation accelerated computing platform to succeed the Nvidia Ampere architecture, launched two years ago.Read More
By Rich Pell

Share:

Named for Grace Hopper, a pioneering U.S. computer scientist, the newNvidia Hopper architecturedelivers an order of magnitude performance leap over its predecessor. The company also announced its first Hopper-based GPU, theNvidia H100, packed with 80 billion transistors.

Offered as the world’s largest and most powerful accelerator, the company says the H100 has groundbreaking features such as a revolutionary Transformer Engine and a highly scalable Nvidia NVLink interconnect for advancing gigantic AI language models, deep recommender systems, genomics and complex digital twins.

“Data centers are becoming AI factories – processing and refining mountains of data to produce intelligence,” says Jensen Huang, founder and CEO of Nvidia. “Nvidia H100 is the engine of the world’s AI infrastructure that enterprises use to accelerate their AI-driven businesses.”

The Nvidia H100 GPU is offered as delivering six breakthrough innovations:

  • World’s Most Advanced Chip — Built with 80 billion transistors using a cutting-edge TSMC 4N process designed for Nvidia’s accelerated compute needs, H100 features major advances to accelerate AI, HPC, memory bandwidth, interconnect and communication, including nearly 5 terabytes per second of external connectivity. H100 is the first GPU to support PCIe Gen5 and the first to utilize HBM3, enabling 3TB/s of memory bandwidth. Twenty H100 GPUs can sustain the equivalent of the entire world’s internet traffic, making it possible for customers to deliver advanced recommender systems and large language models running inference on data in real time.
  • New Transformer Engine— Now the standard model choice for natural language processing, the Transformer is one of the most important deep learning models ever invented. The H100 accelerator’s Transformer Engine is built to speed up these networks as much as 6x versus the previous generation without losing accuracy.
  • 2nd-Generation Secure Multi-Instance GPU — MIG technology allows a single GPU to be partitioned into seven smaller, fully isolated instances to handle different types of jobs. The Hopper architecture extends MIG capabilities by up to 7x over the previous generation by offering secure multitenant configurations in cloud environments across each GPU instance.
  • Confidential Computing — H100 is the world’s first accelerator with confidential computing capabilities to protect AI models and customer data while they are being processed. Customers can also apply confidential computing tofederated learningfor privacy-sensitive industries like healthcare and financial services, as well as on shared cloud infrastructures.
  • 4th-Generation Nvidia NVLink — To accelerate the largest AI models, NVLink combines with a new external NVLink Switch to extend NVLink as a scale-up network beyond the server, connecting up to 256 H100 GPUs at 9x higher bandwidth versus the previous generation using Nvidia HDR Quantum InfiniBand.
  • DPX Instructions— New DPX instructions accelerate dynamic programming — used in a broad range of algorithms, including route optimization and genomics — by up to 40x compared with CPUs and up to 7x compared with previous-generation GPUs. This includes the Floyd-Warshall algorithm to find optimal routes for autonomous robot fleets in dynamic warehouse environments, and the Smith-Waterman algorithm used in sequence alignment for DNA and protein classification and folding.

The combined technology innovations of H100, says the ciompany, extend its AI inference and training leadership to enable real-time and immersive applications using giant-scale AI models. The H100 will enable chatbots using the world’s most powerful monolithic transformer language model, Megatron 530B, with up to 30x higher throughput than the previous generation, while meeting the subsecond latency required for real-time conversational AI.

H100 also allows researchers and developers to train massive models such as Mixture of Experts, with 395 billion parameters, up to 9x faster, reducing the training time from weeks to days. The H100 can be deployed in every type of data center, including on-premises, cloud, hybrid-cloud and edge, and is expected to be available worldwide later this year from the world’s leading cloud service providers and computer makers, as well as directly from Nvidia.

+MORE

Linked Articles

Smart2.0

10s
Baidu