Practical applicability of deep neural networks for overlapping speaker separation

被引：6

作者：

Appeltans, Pieter ^{[1
]}

Zegers, Jeroen ^{[2
]}

Van Hamme, Hugo ^{[2
]}

机构：

[1] Katholieke Univ Leuven, Dept Comp Sci, Leuven, Belgium

[2] Katholieke Univ Leuven, ESAT, Proc Speech & Images ESAT, Leuven, Belgium

来源：

INTERSPEECH 2019 | 2019年

关键词：

Source Separation; Recurrent neural networks; Artificial neural networks; AUDIO;

D O I：

10.21437/Interspeech.2019-1807

中图分类号：

R36 [病理学]; R76 [耳鼻咽喉科学];

学科分类号：

100104 ; 100213 ;

摘要：

This paper examines the applicability in realistic scenarios of two deep learning based solutions to the overlapping speaker separation problem. Firstly, we present experiments that show that these methods are applicable for a broad range of languages. Further experimentation indicates limited performance loss for untrained languages, when these have common features with the trained language(s). Secondly, it investigates how the methods deal with realistic background noise and proposes some modifications to better cope with these disturbances. The deep learning methods that will be examined are deep clustering and deep attractor networks.

引用

页码：1353 / 1357

页数：5

共 19 条

[1] [Anonymous], 2006, Computational auditory scene analysis: Principles, algorithms, and applications
[2] Barker J, 2015, 2015 IEEE WORKSHOP ON AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING (ASRU), P504, DOI 10.1109/ASRU.2015.7404837
[3] Chen Zhuo, 2017, Proc IEEE Int Conf Acoust Speech Signal Process, V2017, P246, DOI 10.1109/ICASSP.2017.7952155
[4] SOME EXPERIMENTS ON THE RECOGNITION OF SPEECH, WITH ONE AND WITH 2 EARS
CHERRY, EC
[J]. JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 1953, 25 (05) : 975 - 979
[5] Garofolo J. S., 1993, WALL STREET J DATASE
[6] Goodfellow I, 2016, ADAPT COMPUT MACH LE, P1
[7] Graves A, 2012, STUD COMPUT INTELL, V385, P1, DOI [10.1162/neco.1997.9.1.1, 10.1007/978-3-642-24797-2]
[8] Hershey JR, 2016, INT CONF ACOUST SPEE, P31, DOI 10.1109/ICASSP.2016.7471631
[9] Super-human multi-talker speech recognition: A graphical modeling approach
Hershey, John R.
Rennie, Steven J.
Olsen, Peder A.
Kristjansson, Trausti T.
[J]. COMPUTER SPEECH AND LANGUAGE, 2010, 24 (01) : 45 - 66
[10] An Unsupervised Approach to Cochannel Speech Separation
Hu, Ke
Wang, DeLiang
[J]. IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2013, 21 (01): : 120 - 129

← 1 2 →