Loading...
Microsoft Research Focus 03: Week of November 7th, 2022
Microsoft Research Blog

Research Focus: Week of November 7, 2022 

November 8, 2022

Welcome to Research Focus, a new series of blog posts that highlights notable publications, events, code/datasets, new hires and other milestones from across the research community at Microsoft. Barun Patra, Saksham Singhal, Shaohan Huang, Zewen Chi, Li Dong, Furu Wei,…

Three bar plots. The first plot shows that the model size of XTC-BERT is 32 times smaller than that of BERT, and two dots show the accuracy of BERT and XTC-BERT, which are 83.95 and 83.44, respectively. The second one shows that INT8 using ZeroQuant can be 2.6 times faster than Baseline with FP16 using PyTorch and ZeoQuant can reduce the number of GPUs for inference from 2 to 1, which in total provides 5.2 times efficiency. It also shows that ZeroQuant has 50.4 accuracy compared to 50.5 using Baseline PyTorch. The third plot shows that ZeroQuant is more than 5000 times cheaper than baseline to compress a model, and the accuracy of ZeroQuant is 42.26 compared to 42.35 of baseline.
Microsoft Research Blog

DeepSpeed Compression: A composable library for extreme compression and zero-cost quantization 

July 20, 2022 | DeepSpeed Team and Andrey Proskurin

Large-scale models are revolutionizing deep learning and AI research, driving major improvements in language understanding, generating creative texts, multi-lingual translation and many more. But despite their remarkable capabilities, the models’ large size creates latency and cost constraints that hinder the…

In the news | ZDNet

Microsoft improves Translator and Azure AI services with new AI ‘Z-code’ models 

March 22, 2022

Microsoft is updating its Translator and other Azure AI services with a set of AI models called Z-code, officials announced on March 22. These updates will improve the quality of machine translations, as well as help these services support more…

DeepSpeed shares findings and innovations for MoE models and systems that 1) reduce training cost by 5x, 2) reduce MoE parameter size by up to 3.7x and 3) reduce MoE inference latency by 7.3x at an unprecedented scale and offer up to 4.5x faster and 9x cheaper inference for MoE models compared to quality-equivalent dense models.
Microsoft Research Blog

DeepSpeed: Advancing MoE inference and training to power next-generation AI scale 

January 19, 2022 | DeepSpeed Team and Andrey Proskurin

In the last three years, the largest trained dense models have increased in size by over 1,000 times, from a few hundred million parameters to over 500 billion parameters in Megatron-Turing NLG 530B (MT-NLG). Improvements in model quality with size…

SuperGLUE leaderboards showing T-NLRv5 at the top
Microsoft Research Blog

Efficiently and effectively scaling up language model pretraining for best language representation model on GLUE and SuperGLUE 

December 2, 2021 | Jianfeng Gao and Saurabh Tiwary

As part of Microsoft AI at Scale (opens in new tab), the Turing family of NLP models are being used at scale across Microsoft to enable the next generation of AI experiences. Today, we are happy to announce that the…

Z-Code multilingual model

In the news | Microsoft Translator Blog

Multilingual translation at scale: 10000 language pairs and beyond 

November 22, 2021

Microsoft is on a quest for AI at Scale with high ambition to enable the next generation of AI experiences. The Microsoft Translator ZCode team is working together with Microsoft Project Turing and Microsoft Research Asia to advance language and…

An illustration of how the image text contrastive and translation text contrastive tasks work together to help align the space of images, English text and non-English text. On the left side of the illustration, the three domains—Image Domain, English Domain, and Non-English Domain--are segregated. An arrow labeled “Image-Captions training data” points to another depiction of the three domains where the image domain and the English domain intersect but the non-English domain is still separate and shown in gray to show that it’s not significantly affected. A two headed arrow with the label “Image-Text contrastive loss” is drawn between the image and English domains. Towards the bottom of the image, an arrow labeled “Parallel corpus training data” points to another depiction of the three domains where the English domain and the non-English domain intersect but the image domain is separate and shown in gray to indicate that it is not significantly affected. A two-headed arrow with the label “Translated Text Contrastive loss” is drawn between the English and non-English domains. Finally, a third arrow with the label “Resulting Effect” is drawn to the right of the image which points to a depiction of all three domains intersecting.
Microsoft Research Blog

Turing Bletchley: A Universal Image Language Representation model by Microsoft 

November 1, 2021 | Saurabh Tiwary

Today, the Microsoft Turing team (opens in new tab) is thrilled to introduce Turing Bletchley, a 2.5-billion parameter Universal Image Language Representation model (T-UILR) that can perform image-language tasks in 94 languages. T-Bletchley has an image encoder and a universal language encoder that vectorize…

Figure 1. Trend of sizes of state-of-the-art NLP models over time
Microsoft Research Blog

Using DeepSpeed and Megatron to Train Megatron-Turing NLG 530B, the World’s Largest and Most Powerful Generative Language Model 

October 11, 2021 | Ali Alvi and Paresh Kharya

We are excited to introduce the DeepSpeed- and Megatron-powered Megatron-Turing Natural Language Generation model (MT-NLG), the largest and the most powerful monolithic transformer language model trained to date, with 530 billion parameters. It is the result of a research collaboration…

XTREME leaderboard showing T-ULRv5 at the top.
Microsoft Research Blog

Microsoft Turing Universal Language Representation model, T-ULRv5, tops XTREME leaderboard and trains 100x faster 

September 28, 2021 | Saurabh Tiwary and Lidong Zhou

Today, we are excited to announce that with our latest Turing universal language representation model (T-ULRv5), a Microsoft-created model is once again the state of the art and at the top of the Google XTREME public leaderboard (opens in new…

  • Previous
  • 1
  • 2
  • 3
  • …
  • 8
  • Next