Stanford Megatron-LM compatibility#

2025-09-11

3 min read time

Applies to Linux

Stanford Megatron-LM is a large-scale language model training framework developed by NVIDIA NVIDIA/Megatron-LM. It is designed to train massive transformer-based language models efficiently by model and data parallelism.

  • ROCm support for Stanford Megatron-LM is hosted in the official ROCm/Stanford-Megatron-LM repository.

  • Due to independent compatibility considerations, this location differs from the stanford-futuredata/Megatron-LM upstream repository.

  • Use the prebuilt Docker image with ROCm, PyTorch, and Megatron-LM preinstalled.

  • See the ROCm Stanford Megatron-LM installation guide to install and get started.

Note

Stanford Megatron-LM is supported on ROCm 6.3.0.

Supported Devices#

  • Officially Supported: AMD Instinct MI300X

  • Partially Supported (functionality or performance limitations): AMD Instinct MI250X, MI210X

Supported models and features#

This section details models & features that are supported by the ROCm version on Stanford Megatron-LM.

Models:

  • Bert

  • GPT

  • T5

  • ICT

Features:

  • Distributed Pre-training

  • Activation Checkpointing and Recomputation

  • Distributed Optimizer

  • Mixture-of-Experts

Use cases and recommendations#

See the Efficient MoE training on AMD ROCm: How-to use Megablocks on AMD GPUs blog post to leverage the ROCm platform for pre-training by using the Stanford Megatron-LM framework of pre-processing datasets on AMD GPUs. Coverage includes:

  • Single-GPU pre-training

  • Multi-GPU pre-training

Docker image compatibility#

AMD validates and publishes Stanford Megatron-LM images with ROCm and Pytorch backends on Docker Hub. The following Docker image tags and associated inventories represent the latest Megatron-LM version from the official Docker Hub. The Docker images have been validated for ROCm 6.3.0. Click to view the image on Docker Hub.

Docker image

Stanford Megatron-LM

PyTorch

Ubuntu

Python

85f95ae

2.4.0

24.04

3.12.9