GPU Compute
Reduce your cloud costs without sacrificing performance or reliability. mCloud delivers enterprise grade cloud infrastructure at a fraction of the cost of AWS, Google Cloud, or Azure.
Micron21 operates Australia’s first Tier IV-certified data centre, offering 100% uptime, redundant power, and high availability architecture.
Our cloud specialists provide 24/7 Australian-based support, ensuring seamless deployments and efficient troubleshooting.
High-Performance Computing
Provided as a High-Availability mCloud Virtual Cloud Server
Overview
The A100 80GB pairs NVIDIA's Ampere architecture and third-generation Tensor Cores with 80GB of HBM2e memory and the world's fastest GPU bandwidth - over 2 TB/s. That keeps the largest models and most massive datasets resident on the GPU, speeding time to solution for the most demanding AI and HPC workloads, while Multi-Instance GPU partitions one card into seven isolated 10GB instances.
Capabilities
Up to 312 TFLOPS of deep-learning performance and 20× the Tensor throughput of the previous Volta generation for training and inference.
Partition a single card into as many as seven fully isolated 10GB instances, each with its own memory, cache, and compute - right-sized acceleration at scale.
Tensor Cores exploit sparsity in AI models to deliver up to 2× higher performance, most notably for inference but also during training.
Connect two GPUs over an NVLink bridge at 600 GB/s - double the previous generation's throughput - for workloads that span multiple cards.
The world's fastest GPU memory - over 2 TB/s of bandwidth at 95% DRAM utilisation efficiency, and 1.7× the bandwidth of the previous generation.
One accelerator for every job - FP64 and TF32 for HPC and training, BF16/FP16 for deep learning, and INT8 for high-throughput inference.
Specifications
Specifications per the NVIDIA A100 Tensor Core GPU datasheet (r4). Peak rates marked “with sparsity” require structural-sparsity-enabled models.
Performance
Each figure compares the A100 against a different reference point, as published by NVIDIA - note the baseline beneath each number.
Memory bandwidth
Over 2 TB/s of HBM2e bandwidth: the fastest GPU memory available.
AI inference throughput
BERT-Large inference vs a CPU-only server (INT8 with sparsity).
Higher performance
vs the prior NVIDIA Volta generation, across training and inference.
DRAM utilisation
HBM2e keeps the memory system working at near-peak efficiency.
Where It Fits
Ideal workloads
Why run it on mCloud
Spin up flagship GPU compute on our Tier IV Australian cloud, or talk to our specialists about sizing the right configuration for your workload.