mCloud GPU - NVIDIA A100 40GB vs 80GB - mCloud - Enterprise

Specifications

Where They Differ

The differences are concentrated in memory capacity, bandwidth, and power draw. Everything below the cards is identical on both models.

Best for cost-efficiency

NVIDIA A100

40GB HBM2

The price-to-performance choice for inference, fine-tuning, and MIG-partitioned multi-tenant workloads that fit comfortably in 40GB.

GPU memory40GB HBM2
Memory bandwidth1,555 GB/s PCIe
Max TDP250W PCIe
MIG instancesUp to 7 @ 5GB
Lowest power drawYes most efficient per watt

Best for performance

NVIDIA A100

80GB HBM2e

Double the memory and the world's fastest GPU bandwidth, built for the largest models, biggest datasets, and most demanding HPC.

GPU memory80GB HBM2e
Memory bandwidth1,935 GB/s PCIe
Max TDP300W PCIe
MIG instancesUp to 7 @ 10GB
Memory bandwidthOver 2 TB/s fastest available

Identical on both models

Same Ampere die, same third-generation Tensor Cores: peak compute does not change with memory capacity.

9.7 TFLOPS

FP64

19.5 TFLOPS

FP32 / FP64 Tensor Core

312 TFLOPS

TF32 Tensor Core*

624 TFLOPS

FP16 Tensor Core*

624 TFLOPS

BFLOAT16 Tensor Core*

1,248 TOPS

INT8 Tensor Core*

Max MIG instances

Ampere

GPU architecture

Benefits

Two Models, Two Strengths

A100 40GB: the efficient workhorse

Maximise value per GPU

Lower cost of entry. The most affordable way onto the A100 platform when your workloads fit within 40GB of memory.
Lower power draw. A 250W PCIe TDP versus 300W on the 80GB means less energy per GPU, with better efficiency and density.
Ideal for inference & fine-tuning. Delivers up to 245× inference throughput over CPU-only servers on BERT-Large.
Multi-tenant ready. Partition into up to seven 5GB MIG instances to right-size acceleration across many users and jobs.
Best price-to-performance for development environments, batch inference, and models that don't need the extra headroom.

A100 80GB: the performance flagship

Remove the memory ceiling

Double the memory. 80GB of HBM2e keeps the largest models and most massive datasets resident on-GPU.
World's fastest bandwidth. Over 2 TB/s feeds the Tensor Cores without starving them, speeding time to solution.
Up to 3× faster AI training on the largest models, where the 40GB has to compromise on batch size.
Up to 2× big-data analytics and up to 1.8× HPC throughput versus the 40GB model.
Larger MIG slices. Seven 10GB instances mean each partition can host substantially bigger workloads.

Performance · AI Training

Training the Largest Models

Training huge recommender models like DLRM is bound by how much fits in GPU memory. The 80GB's extra capacity allows larger batch sizes, delivering up to 3× the training throughput of the 40GB, and well beyond the previous-generation V100.

NVIDIA V100FP16, batch 32

0.7×

A100 40GBFP16, batch 32

1×

A100 80GBFP16, batch 48

3×

DLRM training on the HugeCTR framework, FP16, relative time per 1,000 iterations. NVIDIA A100 datasheet.

Performance · AI Inference

Inference Throughput over CPU

For high-throughput inference like BERT-Large, both A100 models leave CPU-only servers far behind, more than 240× the sequences per second. On this workload the two cards are effectively matched, so the 40GB delivers flagship inference without the memory premium.

CPU onlyDual Xeon Gold 6240, FP32

1×

A100 40GBINT8 + sparsity

245×

A100 80GBINT8 + sparsity

249×

BERT-Large inference, sequences per second. CPU: dual Xeon Gold 6240, FP32, batch 128. A100 40GB and 80GB: batch 256, INT8 with sparsity. NVIDIA A100 datasheet.

Performance · Real-Time Inference

Where Memory Helps Inference

On latency-sensitive, single-stream inference such as RNN-T speech recognition, the 80GB's extra headroom pulls ahead, up to 1.25× the throughput of the 40GB on the same MIG slice.

A100 40GB1/7 MIG slice

1×

A100 80GB1/7 MIG slice

1.25×

RNN-T single-stream inference, MLPerf 0.7, measured on one (1/7) MIG slice. TensorRT 7.2, LibriSpeech, FP16. NVIDIA A100 datasheet.

Performance · Data Analytics

Big Data Analytics at Scale

On a 10TB analytics benchmark spanning ETL, SQL, ML, and NLP, the 80GB completes the run in half the time of the 40GB, 2× faster, and up to 8× faster than the V100.

V100 32GBRAPIDS / Dask

1×

A100 40GBRAPIDS / Dask / BlazingSQL

4×

A100 80GBRAPIDS / Dask / BlazingSQL

8×

GPU-BDB big-data analytics benchmark: 30 retail queries plus ETL, ML, and NLP on a 10TB dataset, relative time to solution. NVIDIA A100 datasheet.

Performance · HPC

High-Performance Computing

For memory-bound HPC like Quantum Espresso, the 80GB delivers up to 1.8× the performance of the 40GB at full FP64 precision. Across the top HPC applications, the A100 generation is around 11× faster than the 2016 P100.

A100 40GBQuantum Espresso, FP64

1×

A100 80GBQuantum Espresso, FP64

1.8×

Quantum Espresso, CNT10POR8 dataset, FP64, relative time to solution. The 11× figure is the geometric-mean speedup over P100 across top HPC apps. NVIDIA A100 datasheet.

Not sure which A100 you need?

Configure either model with the pricing calculator, or talk to our Australian-based cloud specialists about matching the right GPU to your workload.

Facebook

Cart Items

Simple, transparent pricing from Australia's leading cloud provider