Dual-Path RNN For Long Recording Speech Separation

Chenda Li; Yi Luo; Cong Han; Jinyu Li; Takuya Yoshioka; Tianyan Zhou; Marc Delcroix; Keisuke Kinoshita; Christoph Boeddeker; Yanmin Qian; Shinji Watanabe; Zhuo Chen

Dual-Path RNN For Long Recording Speech Separation

Chenda Li ,
Yi Luo ,
Cong Han ,
Jinyu Li ,
Takuya Yoshioka ,
Tianyan Zhou ,
Marc Delcroix ,
Keisuke Kinoshita ,
Christoph Boeddeker ,
Yanmin Qian ,
Shinji Watanabe ,
Zhuo Chen

Spoken Language Technology Workshop | January 2021

Organized by IEEE

Download BibTex

Continuous speech separation (CSS) is an arising task in speech separation aiming at separating overlap-free targets from a long, partially-overlapped recording. A straightforward extension of previously proposed sentence-level separation models to this task is to segment the long recording into fixed-length blocks and perform separation on them independently. However, such simple extension does not fully address the cross-block dependencies and the separation performance may not be satisfactory. In this paper, we focus on how the block-level separation performance can be improved by exploring methods to utilize the cross-block information. Based on the recently proposed dual-path RNN (DPRNN) architecture, we investigate how DPRNN can help the block-level separation by the interleaved intra- and inter-block modules. Experiment results show that DPRNN is able to significantly outperform the baseline block-level model in both offline and block-online configurations under certain settings.