Multi-Microphone Neural Speech Separation for Far-Field Multi-Talker Speech Recognition

Takuya Yoshioka; Hakan Erdogan; Zhuo Chen; Fil Alleva

Multi-Microphone Neural Speech Separation for Far-Field Multi-Talker Speech Recognition

Takuya Yoshioka ,
Hakan Erdogan ,
Zhuo Chen ,
Fil Alleva

ICASSP 2018 | April 2018

Published by IEEE

Download BibTex

This paper describes a neural network approach to far-field speech separation using multiple microphones. Our proposed approach is speaker-independent and can learn to implicitly figure out the number of speakers constituting an input speech mixture. This is realized by utilizing the permutation invariant training (PIT) framework, which was recently proposed for single-microphone speech separation. In this paper, PIT is extended to effectively leverage multi-microphone input. It is also combined with beamforming for better recognition accuracy. The effectiveness of the proposed approach is investigated by multi-talker speech recognition experiments that use a large quantity of training data and encompass a range of mixing conditions. Our multi-microphone speech separation system significantly outperforms the single-microphone PIT. Several aspects of the proposed approach are experimentally investigated.