NVIDIA A100 40GB

Data-centre acceleration for AI inference, fine-tuning, HPC, and analytics - with up to 20× the performance of the previous generation, available on demand from our Tier IV Australian cloud.

Available now 40GB HBM2 1,555 GB/s bandwidth MIG instances 312 TFLOPS FP16 Tensor
 
EOFY SaleLimited Offer

GPU Compute

NVIDIA A100 Tensor Core GPU

Cost Effective Resources & Infrastructure

Reduce your cloud costs without sacrificing performance or reliability. mCloud delivers enterprise grade cloud infrastructure at a fraction of the cost of AWS, Google Cloud, or Azure.

Fault-Tolerant Tier IV Data Centre

Micron21 operates Australia’s first Tier IV-certified data centre, offering 100% uptime, redundant power, and high availability architecture.

24/7 Australian-based Expert Support

Our cloud specialists provide 24/7 Australian-based support, ensuring seamless deployments and efficient troubleshooting.

NVIDIA A100 (40 GB)

High-Performance Computing

$1,618 AUD / month

Minimum Specifications

Provided as a High-Availability mCloud Virtual Cloud Server

  • GPU: NVIDIA A100 (40 GB)
  • GPU Compute: Dedicated
  • vCPU: 12 Cores - XEON Gold
  • RAM: 64 GB - DDR4
  • Storage: 500 GB - NVMe SSD
  • Bandwidth: 2 TB p/m
  • IP Address: Included
  • DDoS Protection: Shield

Overview

Enterprise Acceleration, Right-Sized

The A100 40GB pairs NVIDIA's Ampere architecture and third-generation Tensor Cores with 40GB of high-bandwidth HBM2 memory. It supports the full range of math precisions, from FP64 for HPC to INT8 for inference, making it a single accelerator that adapts to almost any data-centre workload, while Multi-Instance GPU lets one card serve up to seven isolated jobs at once.

40GB HBM2
GPU memory
1,555 GB/s
Memory bandwidth
7
MIG instances @ 5GB
2,000+
Accelerated applications

Capabilities

What Makes the A100 40GB Different

Third-Gen Tensor Cores

Up to 312 TFLOPS of deep-learning performance and 20× the Tensor throughput of the previous Volta generation for training and inference.

Multi-Instance GPU (MIG)

Partition a single card into as many as seven fully isolated 5GB instances, each with its own memory, cache, and compute: ideal for multi-tenant serving.

Structural Sparsity

Tensor Cores exploit sparsity in AI models to deliver up to 2× higher performance, most notably for inference but also during training.

Next-Gen NVLink

Connect two GPUs over an NVLink bridge at 600 GB/s, double the previous generation's throughput, for workloads that outgrow a single card.

40GB HBM2 Memory

1,555 GB/s of memory bandwidth keeps the Tensor Cores fed, with enough capacity for production inference, fine-tuning, and mid-sized training runs.

Every Math Precision

One accelerator for every job: FP64 and TF32 for HPC and training, BF16/FP16 for deep learning, and INT8 for high-throughput inference.

Specifications

Technical Specifications

Compute & Tensor Cores

  • FP649.7 TFLOPS
  • FP64 Tensor Core19.5 TFLOPS
  • FP3219.5 TFLOPS
  • TF32 Tensor Core156 TFLOPS 312 TFLOPS with sparsity
  • BFLOAT16 Tensor Core312 TFLOPS 624 TFLOPS with sparsity
  • FP16 Tensor Core312 TFLOPS 624 TFLOPS with sparsity
  • INT8 Tensor Core624 TOPS 1,248 TOPS with sparsity

Memory, Platform & Form Factor

  • GPU memory40GB HBM2
  • Memory bandwidth1,555 GB/s
  • Max thermal design power250W
  • Multi-Instance GPUUp to 7 @ 5GB
  • InterconnectNVLink 600 GB/s PCIe Gen4 64 GB/s
  • Form factorPCIe
  • GPU architectureNVIDIA Ampere

Specifications per the NVIDIA A100 Tensor Core GPU datasheet (r4). Peak rates marked “with sparsity” require structural-sparsity-enabled models.

Performance

Built to Accelerate

Each figure compares the A100 against a different reference point, as published by NVIDIA. Note the baseline beneath each number.

20×

Higher performance

vs the prior NVIDIA Volta generation, across AI training and inference.

245×

AI inference throughput

BERT-Large inference vs a CPU-only server (INT8 with sparsity).

11×

HPC throughput

Across top HPC apps vs P100, a four-year generational gain.

2×

Sparse-model speed-up

From structural sparsity in Tensor Cores, primarily for inference.

Where It Fits

The Right Card for the Job

Why run it on mCloud

Australian, on-demand, supported

  • Tier IV data centre: Australia's first, with redundant power and high availability.
  • Fast underlying platform: NVMe storage and 100Gbps networking feed the GPU.
  • 24/7 Australian support: local cloud specialists for deployment and troubleshooting.
  • Pay for what you use: scale instances up or down to match demand.
  • OpenStack & IaC ready: automate provisioning with the API and Terraform.

Deploy the A100 40GB today

Spin up GPU compute on our Tier IV Australian cloud, or talk to our specialists about sizing the right configuration for your workload.

 

Sign up for the Micron21 Newsletter