The Nvidia H200 is the first GPU to offer HBM3e — faster, larger memory to drive Generative AI and LLM acceleration. Photo: Nvidia co-founder and CEO Jensen Huang attends Hon Hai Tech Day in Taipei on October 18, 2023. (Photo by I-Hwa Cheng / AFP)
- Nvidia said the H200 Tensor Core GPU has more memory capacity and bandwidth, accelerating its work with generative AI and HPC workloads.
- The Nvidia H200 is the first GPU to offer HBM3e — faster, larger memory to drive Generative AI and LLM acceleration.
- The H200 chips are set for 2Q24, and Nvidia said it will work with “global system manufacturers and cloud providers” for widespread availability.
It’s been almost a year since the launch of OpenAI ChatGPT, and global demand for AI chips is becoming more insatiable than ever. Today, most major tech companies are focusing all their attention on generative artificial intelligence. It’s never been better for the company that makes the most high-performance graphics processing units (GPUs), Nvidia Corp. After releasing dozens of chips for the artificial intelligence market that is growing at a seemingly exponential rate, the graphics chip giant has teased its most powerful GPU yet – the H200.
The NVIDIA H200 Tensor Core GPU comes at a time when Nvidia is trying to defend its own dominant position in the AI computing space in the face Intel, AMDand a slew of chip startups and cloud providers like Amazon Web Services trying to capture market share amid booming chip demand fueled by generative AI workloads.

NVIDIA H200 Tensor Core GPU. Source: Nvidia
To maintain its leadership in hardware AI and high-performance computing (HPC), Nvidia unveiled its plans to accelerate the development of new GPU architectures early last month. The idea is to return to the one-year rhythm of product introduction, according to him road map published for investors and further explained by Semi-analysis. “Nvidia’s move to annual AI GPU updates is very significant and has many ramifications,” SemiAnalysis said in a report.
And the start of it all would be the H200, which Nvidia unveiled yesterday, using the Hopper architecture to accelerate AI applications. It’s a sequel H100 GPU, released last year and previously Nvidia’s most powerful AI GPU chip. In short, the H200 is now the most powerful AI chip Nvidia has in its portfolio.
Ian Buck, vice president of Hyperscale and HPC at Nvidia, believes that “Nvidia H200, the industry’s leading end-to-end AI supercomputing platform, just got faster to solve some of the world’s most important challenges.” In general, GPUs excel in artificial intelligence applications due to their ability to perform numerous parallel matrix multiplications, which is a key operation for neural networks to function.
They play a vital role in the training phase of constructing an AI model and the later “inference” phase, where users feed data into the AI model and it produces the appropriate results. “To create intelligence with generative AI and HPC applications, massive amounts of data must be efficiently processed at high speed using large, fast GPU memory,” noted Buck.
Therefore, the introduction of the H200 will lead to further performance leaps, including almost doubling the inference speed on Llama 2, the 70 billion parameter LLM, compared to the H100. According to Nvidia, additional performance leadership and improvements with the H200 are expected with future software updates.
Other details about the Nvidia H200

TensorRT-LLM evaluation of the new H200 GPU achieves 11,819 tokens/s on Llama2-13B on a single GPU. H200 is up to 1.9x faster than H100. This performance is made possible by the H200’s larger, faster HBM3e memory. Source: X
While the H200 appears largely similar to the H100, its memory modifications represent a significant improvement. The new GPU introduces an innovative and faster memory specification known as HBM3e. This brings the GPU’s memory bandwidth to 4.8 terabytes per second, a significant increase over the H100’s 3.35 terabytes per second. It expands its total memory capacity to 141 GB, compared to the 80 GB of its predecessor.
“Nvidia H200 is the first GPU to offer HBM3e — faster, larger memory to drive acceleration of generative artificial intelligence and large language models (LLM) while advancing scientific computing for HPC workloads. With HBM3e, the NVIDIA H200 delivers 141 GB of memory at 4.8 terabytes per second, nearly double the capacity and 2.4x the bandwidth compared to its predecessor, the NVIDIA A100,” chip giant He said.
In context, OpenAI has often mentioned dealing with a lack of GPU resources, leading to slower performance in ChatGPT. To maintain any level of service, the company resorts to rate limiting. In theory, the inclusion of the H200 could alleviate the resource constraints on the current AI language models powering ChatGPT, allowing them to efficiently cater to a more extensive user base.
Nvidia also said it will make the H200 available in several form factors. This includes Nvidia HGX H200 server boards in four-way and eight-way configurations, compatible with HGX H100 system hardware and software. It will also be available in the Nvidia GH200 Grace Hopper Superchip, which combines the CPU and GPU in one package.
“With these options, the H200 can be deployed in every type of data center, including on-premises, cloud, hybrid cloud and edge. NVIDIA’s global ecosystem of partners server manufacturers — including ASRock Rack, ASUS, Dell Technologies, Eviden, GIGABYTE, Hewlett Packard Enterprise, Ingrasys, Lenovo, QCT, Supermicro, Wistron and Wiwynn — can update their existing systems with the H200,” Nvidia noted.
According to US chip giant Amazon Web Services (AWS), Google Cloud, Microsoft Azure, and Oracle Cloud Infrastructure will be among the first cloud providers to deploy H200-based instances starting next year, along with CoreWeave, Lambda, and Vultr. Currently, Nvidia is at the forefront of the AI GPU market.
However, major players like AWS, Google, Microsoft and traditional AI and HPC entities love it AMD is actively preparing their next generation processors for training and inference. In response to this competitive environment, Nvidia accelerated its B100 and X100-based product timelines.

Nvidia plans to accelerate development of new GPU architectures and actually return to its one-year cadence for product introductions, according to its roadmap released to investors. Source: Nvidia