The NVIDIA HGX™ platform brings together the full power of NVIDIA GPUs, NVIDIA NVLink™, NVIDIA networking, and fully optimized AI and high-performance computing (HPC) software stacks to provide the highest application performance and drive the fastest time to insights for every data center.
The NVIDIA HGX B300 integrates eight NVIDIA Blackwell Ultra GPUs with high-speed interconnects, delivering 1.5x more dense FP4 Tensor Core FLOPS and 2x attention performance versus HGX B200 to propel the data center into a new era of accelerated computing and generative AI. As a premier accelerated scale-up platform with up to 30x more AI Factory output than the previous generation, NVIDIA Blackwell Ultra-based HGX systems are designed for the most demanding generative AI, data analytics, and HPC workloads.
DeepSeek-R1 ISL = 32K, OSL = 8K, HGX B300 with FP4 Dynamo disaggregation. H100 with FP8 In-flight batching. Projected performance subject to change.
The frontier curve illustrates key parameters that determine AI factory token revenue output. The vertical axis represents GPU tokens per second (TPS) throughput in one megawatt (MW) AI factory, while the horizontal axis quantifies user interactivity and responsiveness as TPS for a single user. At the optimal intersection of throughput and responsiveness, HGX B300 yields a 30x overall increase in AI factory output performance compared to the NVIDIA Hopper architecture for maximum token revenue.
Projected performance subject to change. Perf per GPU, FP8, 16K BS, 16K sequence length.
The HGX B300 platform delivers up to 2.6x higher training performance for large language models such as DeepSeek-R1. With over 2 TB of high-speed memory and 14.4 TB/s of NVLink Switch bandwidth, it enables massive-scale model training and high-throughput inter-GPU communication.
The data center is the new unit of computing, and networking plays an integral role in scaling application performance across it. Paired with NVIDIA Quantum InfiniBand, HGX delivers world-class performance and efficiency, which ensures the full utilization of computing resources.
For AI cloud data centers that deploy Ethernet, HGX is best used with the NVIDIA Spectrum-X networking platform, which powers the highest AI performance over Ethernet. It features Spectrum-X switches and NVIDIA SuperNIC for optimal resource utilization and performance isolation, delivering consistent, predictable outcomes for thousands of simultaneous AI jobs at every scale. Spectrum-X enables advanced cloud multi-tenancy and zero-trust security. As a reference design, NVIDIA has designed Israel-1, a hyperscale generative AI supercomputer built with Dell PowerEdge XE9680 servers based on the NVIDIA HGX 8-GPU platform, BlueField-3 SuperNICs, and Spectrum-4 switches.
NVIDIA HGX is available in single baseboards with four or eight Hopper SXMs or eight NVIDIA Blackwell or NVIDIA Blackwell Ultra SXMs. These powerful combinations of hardware and software lay the foundation for unprecedented AI supercomputing performance.
HGX B300 | HGX B200 | |
---|---|---|
Form Factor | 8x NVIDIA Blackwell Ultra SXM | 8x NVIDIA Blackwell SXM |
FP4 Tensor Core1 | 144 PFLOPS | 108 PFLOPS | 144 PFLOPS | 72 PFLOPS |
FP8/FP6 Tensor Core2 | 72 PFLOPS | 72 PFLOPS |
INT8 Tensor Core2 | 3 POPS | 72 POPS |
FP16/BF16 Tensor Core2 | 36 PFLOPS | 36 PFLOPS |
TF32 Tensor Core2 | 18 PFLOPS | 18 PFLOPS |
FP32 | 600 TFLOPS | 600 TFLOPS |
FP64/FP64 Tensor Core | 10 TFLOPS | 296 TFLOPS |
Total Memory | 2.1 TB | 1.4 TB |
NVIDIA NVLink | Fifth generation | Fifth generation |
NVIDIA NVLink Switch™ | NVLink 5 Switch | NVLink 5 Switch |
NVLink GPU-to-GPU Bandwidth | 1.8 TB/s | 1.8 TB/s |
Total NVLink Bandwidth | 14.4 TB/s | 14.4 TB/s |
Networking Bandwidth | 1.6 TB/s | 0.8 TB/s |
Attention Performance3 | 2x | 1x |
1. Specification in Sparse | Dense
2. Specification in Sparse. Dense is ½ sparse spec shown.
3. vs. Blackwell.
HGX H200 | ||||
---|---|---|---|---|
4-GPU | 8-GPU | |||
Form Factor | 4x NVIDIA H200 SXM | 8x NVIDIA H200 SXM | ||
FP8 Tensor Core* | 16 PFLOPS | 32 PFLOPS | ||
INT8 Tensor Core* | 16 POPS | 32 POPS | ||
FP16/BF16 Tensor Core* | 8 PFLOPS | 16 PFLOPS | ||
TF32 Tensor Core* | 4 PFLOPS | 8 PFLOPS | ||
FP32 | 270 TFLOPS | 540 TFLOPS | ||
FP64 | 140 TFLOPS | 270 TFLOPS | ||
FP64 Tensor Core | 270 TFLOPS | 540 TFLOPS | ||
Total Memory | 564 GB HBM3E | 1.1 TB HBM3E | ||
GPU Aggregate Bandwidth | 19 TB/s | 38 TB/s | ||
NVLink | Fourth generation | Fourth generation | ||
NVSwitch | N/A | NVLink 4 Switch | ||
NVSwitch GPU-to-GPU Bandwidth | N/A | 900 GB/s | ||
Total Aggregate Bandwidth | 3.6 TB/s | 7.2 TB/s | ||
Networking Bandwidth | 0.4 TB/s | 0.8 TB/s |
HGX H100 | ||||
---|---|---|---|---|
4-GPU | 8-GPU | |||
Form Factor | 4x NVIDIA H100 SXM | 8x NVIDIA H100 SXM | ||
FP8 Tensor Core* | 16 PFLOPS | 32 PFLOPS | ||
INT8 Tensor Core* | 16 POPS | 32 POPS | ||
FP16/BF16 Tensor Core* | 8 PFLOPS | 16 PFLOPS | ||
TF32 Tensor Core* | 4 PFLOPS | 8 PFLOPS | ||
FP32 | 270 TFLOPS | 540 TFLOPS | ||
FP64 | 140 TFLOPS | 270 TFLOPS | ||
FP64 Tensor Core | 270 TFLOPS | 540 TFLOPS | ||
Total Memory | 320 GB HBM3 | 640 GB HBM3 | ||
GPU Aggregate Bandwidth | 13 TB/s | 27 TB/s | ||
NVLink | Fourth generation | Fourth generation | ||
NVSwitch | N/A | NVLink 4 Switch | ||
NVSwitch GPU-to-GPU Bandwidth | N/A | 900 GB/s | ||
Total Aggregate Bandwidth | 3.6 TB/s | 7.2 TB/s | ||
Networking Bandwidth | 0.4 TB/s | 0.8 TB/s |
* With sparsity
Learn more about the NVIDIA Blackwell architecture.