Loading...
Collage of four images. 1) a VR haptic pivot device 2) Ashley Lorens of Microsoft Research 3) an image of tractor on a farm 4) image of Race and Technology lecture series speakers.
Microsoft Research Blog

Research at Microsoft 2021: Collaborating for real-world change 

December 15, 2021

Over the past 30 years, Microsoft Research has undergone a shift in how it approaches innovation, broadening its mission to include not only advancing the state of computing but also using technology to tackle some of the world’s most pressing…

A line graph comparing the end-to-end performance of Meta’s MoE language model using Azure NDm A100 v4 VMs with and without Tutel. The x-axis is the number of A100 (80GB) GPUs, beginning at 8 and going up to 512, and the y-axis is the throughput (K tokens/s), beginning with 0 and going up to 1,000 in intervals of 100. Tutel always achieves higher throughput than fairseq.
Microsoft Research Blog

Tutel: An efficient mixture-of-experts implementation for large DNN model training 

November 22, 2021 | Wei Cui, Yifan Xiong, Peng Cheng, and Rafael Salas

Mixture of experts (MoE) is a deep learning model architecture in which computational cost is sublinear to the number of parameters, making scaling easier. Nowadays, MoE is the only approach demonstrated to scale deep learning models to trillion-plus parameters, paving…

Z-Code multilingual model

In the news | Microsoft Translator Blog

Multilingual translation at scale: 10000 language pairs and beyond 

November 22, 2021

Microsoft is on a quest for AI at Scale with high ambition to enable the next generation of AI experiences. The Microsoft Translator ZCode team is working together with Microsoft Project Turing and Microsoft Research Asia to advance language and…

Figure 1. Trend of sizes of state-of-the-art NLP models over time
Microsoft Research Blog

Using DeepSpeed and Megatron to Train Megatron-Turing NLG 530B, the World’s Largest and Most Powerful Generative Language Model 

October 11, 2021 | Ali Alvi and Paresh Kharya

We are excited to introduce the DeepSpeed- and Megatron-powered Megatron-Turing Natural Language Generation model (MT-NLG), the largest and the most powerful monolithic transformer language model trained to date, with 530 billion parameters. It is the result of a research collaboration…

In the news | VentureBeat

Microsoft taps AI techniques to bring Translator to 100 languages 

October 11, 2021

Today, Microsoft announced that Microsoft Translator, its AI-powered text translation service, now supports more than 100 different languages and dialects. With the addition of 12 new languages including Georgian, Macedonian, Tibetan, and Uyghur, Microsoft claims that Translator can now make…

DeepSpeed MoE powers eight times bigger models using expert-parallelism + ZeRO-Offload compared with expert-parallelism only. A graph shows supported model sizes on NVIDIA A100 GPUs. DeepSpeed MoE scales near-linearly with respect to the number of GPUs. Z-code MoE (10B) consistently outperforms other systems on BLEU scores for an in-house 50 language test dataset. Read more in the blog post. 
Microsoft Research Blog

DeepSpeed powers 8x larger MoE model training with high performance 

August 18, 2021 | DeepSpeed Team and Z-code Team

Today, we are proud to announce DeepSpeed MoE, a high-performance system that supports massive scale mixture of experts (MoE) models as part of the DeepSpeed (opens in new tab) optimization library. MoE models are an emerging class of sparsely activated…

DeepSpeed multi GPU inference offers up to 6.9 times throughput improvement for large deep learning model inference. Progressive Layer Dropping offers 2.8 times faster convergence for large model training. 1-bit LAMB offers up to 4.6 times less communication overhead. Single GPU speedups for inference: 2.1 times on BERT Base, 4.4 times on BERT Large, 3.8 times on GPT 2, 3.5 times on GPT 2 XL, 1.9 times on GPT Neo. Multi GPU speedups for inference: 6.2 times for Turing NLG, 3.7 times for 175 billion parameter language model.
Microsoft Research Blog

DeepSpeed: Accelerating large-scale model inference and training via system optimizations and compression 

May 24, 2021 | DeepSpeed Team, Rangan Majumder, and Andrey Proskurin

Last month, the DeepSpeed Team announced ZeRO-Infinity, a step forward in training models with tens of trillions of parameters. In addition to creating optimizations for scale, our team strives to introduce features that also improve speed, cost, and usability. As…

In the news | The Batch

Toward 1 Trillion Parameters 

September 16, 2020

An open source library could spawn trillion-parameter neural networks and help small-time developers build big-league models. What’s new: Microsoft upgraded DeepSpeed, a library that accelerates the PyTorch deep learning framework. The revision makes it possible to train models five times…

In the news | Analytics India Magazine

Microsoft Releases Latest Version Of DeepSpeed, Its Python Library For Deep Learning Optimisation 

September 15, 2020

Recently, Microsoft announced the new advancements in the popular deep learning optimisation library known as DeepSpeed. This library is an important part of Microsoft’s new AI at Scale initiative to enable next-generation AI capabilities at scale.

  • Previous
  • 1
  • 2
  • 3
  • 4
  • Next