ZeRO & DeepSpeed: New system optimizations enable training models with over 100 billion parameters

The latest trend in AI is that larger natural language models provide better accuracy; however, larger models are difficult to train because of cost, time, and ease of code integration. Microsoft is releasing an open-source library called DeepSpeed, which vastly advances large model training by improving scale, speed, cost, and usability, unlocking the ability to … Continue reading ZeRO & DeepSpeed: New system optimizations enable training models with over 100 billion parameters