GPU Compute
Reduce your cloud costs without sacrificing performance or reliability. mCloud delivers enterprise grade cloud infrastructure at a fraction of the cost of AWS, Google Cloud, or Azure.
Micron21 operates Australia’s first Tier IV-certified data centre, offering 100% uptime, redundant power, and high availability architecture.
Our cloud specialists provide 24/7 Australian-based support, ensuring seamless deployments and efficient troubleshooting.
High-Performance Computing
Provided as a High-Availability mCloud Virtual Cloud Server
Overview
The A100 40GB pairs NVIDIA's Ampere architecture and third-generation Tensor Cores with 40GB of high-bandwidth HBM2 memory. It supports the full range of math precisions, from FP64 for HPC to INT8 for inference, making it a single accelerator that adapts to almost any data-centre workload, while Multi-Instance GPU lets one card serve up to seven isolated jobs at once.
Capabilities
Up to 312 TFLOPS of deep-learning performance and 20× the Tensor throughput of the previous Volta generation for training and inference.
Partition a single card into as many as seven fully isolated 5GB instances, each with its own memory, cache, and compute: ideal for multi-tenant serving.
Tensor Cores exploit sparsity in AI models to deliver up to 2× higher performance, most notably for inference but also during training.
Connect two GPUs over an NVLink bridge at 600 GB/s, double the previous generation's throughput, for workloads that outgrow a single card.
1,555 GB/s of memory bandwidth keeps the Tensor Cores fed, with enough capacity for production inference, fine-tuning, and mid-sized training runs.
One accelerator for every job: FP64 and TF32 for HPC and training, BF16/FP16 for deep learning, and INT8 for high-throughput inference.
Specifications
Specifications per the NVIDIA A100 Tensor Core GPU datasheet (r4). Peak rates marked “with sparsity” require structural-sparsity-enabled models.
Performance
Each figure compares the A100 against a different reference point, as published by NVIDIA. Note the baseline beneath each number.
Higher performance
vs the prior NVIDIA Volta generation, across AI training and inference.
AI inference throughput
BERT-Large inference vs a CPU-only server (INT8 with sparsity).
HPC throughput
Across top HPC apps vs P100, a four-year generational gain.
Sparse-model speed-up
From structural sparsity in Tensor Cores, primarily for inference.
Where It Fits
Ideal workloads
Why run it on mCloud
Spin up GPU compute on our Tier IV Australian cloud, or talk to our specialists about sizing the right configuration for your workload.