NVIDIA, Stanford, Microsoft Efficient Trillion-Parameter Model Training on GPU Clusters

NVIDIA, Stanford, Microsoft Efficient Trillion-Parameter Model Training on GPU Clusters

NVIDIA, Stanford & Microsoft Propose Efficient Trillion-Parameter Language Model Training on GPU Clusters
Synced article https://syncedreview.com/2021/04/15/nvidia-stanford-microsoft-propose-efficient-trillion-parameter-language-model-training-on-gpu-clusters

Efficient Large-Scale Language Model Training on GPU Clusters
arXiv paper abstract https://arxiv.org/abs/2104.04473?context=cs.CL
arXiv PDF paper https://arxiv.org/pdf/2104.04473.pdf
GitHub https://github.com/nvidia/megatron-lm

… In this work, we show how to compose different types of parallelism methods (tensor, pipeline, and data paralleism) to scale to thousands of GPUs, achieving a two-order-of-magnitude increase in the sizes of models we can efficiently train compared to existing systems. … The composition of these techniques allows us to perform training iterations on a model with 1 trillion parameters at 502 petaFLOP/s on 3072 GPUs with achieved per-GPU throughput of 52% of peak; previous efforts to train similar-sized models achieve much lower throughput (36% of theoretical peak).

Stay up to date. Subscribe to my posts https://morrislee1234.wixsite.com/website/contact
Web site with my other posts by category https://morrislee1234.wixsite.com/website

LinkedIn https://www.linkedin.com/in/morris-lee-47877b7b

Photo by Christian Wiediger on Unsplash

--

--

AI News Clips by Morris Lee: News to help your R&D

A computer vision consultant in artificial intelligence and related hitech technologies 37+ years. Am innovator with 66+ patents and ready to help a firm's R&D.