Projets
-
- Language Understanding: Don’t just recognize the words a user spoke, but understand what they mean.
- Noise Robustness: How do we make the system work when background noise is present?
- Voice search: Users can search for information such as a business from your phone.
- Automatic Grammar Induction: How do create grammars to ease the development of spoken language systems?
- (MiPad) Multimodal Interactive Pad: Our first multimodal prototype.
- SALT (Speech Enabled Language Tags): A markup language for the multimodal web
- From Captions to Visual Concepts and Back: Image captioning and understanding
- Intent Understanding: Not recognize the words the user says, but understand what they mean.
- Multimodal Conversational User Interface
- Personalized Language Model for improved accuracy
- Recurrent Neural Networks for Language Processing
- Speech Technology for Computational Phonetics and Reading Assessment
- (Whisper) Speech Recognition: Our previous dictation-oriented speech recognition project is a state-of-the-art general-purpose speech recognizer.
- (WhisperID) Speaker Identification: Who is doing the talking?
- Speech Application Programming Interface (SAPI) Development Toolkit: The Whisper speech recognizer can be used by developers to produce applications using speech recognition
Current Projects
Neural Codec Language Model as a Versatile Speech Transformer SpeechX is a versatile speech generation model leveraging audio and text prompts, which can deal with both clean and noisy speech inputs and perform zero-shot TTS and various tasks involving transforming…
Établi:
Project Z-Code, a part of Azure AI Cognitive Services, is working within Project Turing to evolve Microsoft products with the adoption of deep learning pre-trained models.
Project Florence (AI) is a Microsoft AI Cognitive Services initiative, to advance the state of the art computer vision technologies and develop the next generation framework for visual recognition.
Microsoft Azure Florence-VL aims to develop state-of-the-art vision-language learning technologies to endow computers with an ability to effectively learn from multi-modality data.
Synapse Machine Learning expands the distributed computing framework Apache Spark in several new directions and brings new networking capabilities to the Spark ecosystem.
Établi:
The goal of Project Denmark is to move beyond the need for traditional microphone arrays, such as those supported by Microsoft’s Speech Devices SDK, to achieve high-quality capture of meeting conversations.
Établi:
This ongoing project aims to drive the state of the art in speech recognition toward matching, and ultimately surpassing, humans, with a focus on unconstrained conversational speech. The goal is a moving target as the scope of the task is…
Établi:
We want to use eye gaze and face pose to understand what users are looking at, to what they are attending, and use this information to improve speech recognition. Any sort of language constraint makes speech recognition and understanding easier…
Conversational systems interact with people through language to assist, enable, or entertain. Research at Microsoft spans dialogs that use language exclusively, or in conjunctions with additional modalities like gesture; where language is spoken or in text; and in a variety…