Microsoft DeepSpeed lets 1 GPU train 40 billion parameter neural net

AI News Clips by Morris Lee: News to help your R&D

1 min readApr 14, 2021

Microsoft DeepSpeed lets 1 GPU train 40 billion parameter neural net

Microsoft Releases AI Training Library ZeRO-3 Offload
InfoQ article https://www.infoq.com/news/2021/04/microsoft-zero3-offload

DeepSpeed ZeRO-3 Offload
Microsoft DeepSpeed blog https://www.deepspeed.ai/news/2021/03/07/zero3-offload.html

GitHub https://github.com/microsoft/DeepSpeed

The DeepSpeed team provided an overview of the features and benefits of the release in a recent blog post. ZeRO-3 Offload increases the memory efficiency of distributed training for deep-learning models built on the PyTorch framework, providing super-linear scaling across multiple GPUs. By offloading the storage of some data from the GPU to the CPU, larger model sizes per GPU can be trained, enabling model sizes up to 40B parameters on a single GPU. Adopting the DeepSpeed framework for training requires minimal refactoring of model code, and current users can take advantage of the new features by modifying a config file.

Stay up to date. Subscribe to my posts https://morrislee1234.wixsite.com/website/contact
Web site with my other posts by category https://morrislee1234.wixsite.com/website

LinkedIn https://www.linkedin.com/in/morris-lee-47877b7b

Microsoft DeepSpeed lets 1 GPU train 40 billion parameter neural net

Written by AI News Clips by Morris Lee: News to help your R&D

No responses yet