Real-time Single-channel Speech Enhancement with Recurrent Neural Networks

August 23, 2019
Yangyang (Raymond) Xia | Carnegie Mellon University

Single-channel speech enhancement using deep neural networks (DNNs) has shown promising progress in recent years. In this work, we explore several aspects of neural network training that impact the objective quality of enhanced speech in a real-time setting. In particular, we base all studies on a novel recurrent neural network that enhances full-band short-time speech spectra on a single-frame-in, single-frame-out basis, a framework that is adopted by most classical signal processing methods. We propose two novel learning objectives that allow separate control over expected speech distortion versus noise suppression. Moreover, we study the effect of feature normalization and sequence lengths on the objective quality of enhanced speech. Finally, we compare our method with state-of-the-art methods based on statistical signal processing and deep learning, respectively.

[SLIDES]

Speaker Details

Yangyang Raymond Xia received his BSc and MSc in Electrical and Computer Engineering in 2015 and 2016, respectively, from Carnegie Mellon University. He is now a PhD candidate at CMU's Robust Speech Recognition Group, where he works on robust speech enhancement and speaker identification methods by combining speech signal processing and deep learning, under the supervision of Professor Richard M. Stern.

Research Area
- Audio and Acoustics
Research Lab
- Microsoft Research Lab - Redmond
Group
- Audio and Acoustics Research Group
Project
- Neural Networks-based Speech Enhancement

Watch Next

Final intern talk: Distilling Self-Supervised-Learning-Based Speech Quality Assessment into Compact Models
July 18, 2025
Benjamin Stahl
Neural Representation Learning in the Wild: Toward Generalizable Representations and Scalable Citizen Science for Brain-Computer Interfaces
April 17, 2025
Maurice Abou Jaoude,

Chris Aimone ,

Jean-Michel Fournier

, et. al.
Final intern talk: Distilling Self-Supervised-Learning-Based Speech Quality Assessment into Compact
July 18, 2024
Benjamin Stahl
MSR Talk: Unsupervised Speech Reverberation Control with Diffusion Implicit Bridges
May 14, 2024
Eloi Moliner,

Hannes Gamper

Real-time Single-channel Speech Enhancement with Recurrent Neural Networks

Speaker Details

Research Area

Research Lab

Group

Project

Watch Next

Final intern talk: Distilling Self-Supervised-Learning-Based Speech Quality Assessment into Compact Models

Neural Representation Learning in the Wild: Toward Generalizable Representations and Scalable Citizen Science for Brain-Computer Interfaces

Final intern talk: Distilling Self-Supervised-Learning-Based Speech Quality Assessment into Compact

MSR Talk: Unsupervised Speech Reverberation Control with Diffusion Implicit Bridges