Practical applicability of deep neural networks for overlapping speaker separation

被引:6
作者
Appeltans, Pieter [1 ]
Zegers, Jeroen [2 ]
Van Hamme, Hugo [2 ]
机构
[1] Katholieke Univ Leuven, Dept Comp Sci, Leuven, Belgium
[2] Katholieke Univ Leuven, ESAT, Proc Speech & Images ESAT, Leuven, Belgium
来源
INTERSPEECH 2019 | 2019年
关键词
Source Separation; Recurrent neural networks; Artificial neural networks; AUDIO;
D O I
10.21437/Interspeech.2019-1807
中图分类号
R36 [病理学]; R76 [耳鼻咽喉科学];
学科分类号
100104 ; 100213 ;
摘要
This paper examines the applicability in realistic scenarios of two deep learning based solutions to the overlapping speaker separation problem. Firstly, we present experiments that show that these methods are applicable for a broad range of languages. Further experimentation indicates limited performance loss for untrained languages, when these have common features with the trained language(s). Secondly, it investigates how the methods deal with realistic background noise and proposes some modifications to better cope with these disturbances. The deep learning methods that will be examined are deep clustering and deep attractor networks.
引用
收藏
页码:1353 / 1357
页数:5
相关论文
共 19 条
  • [1] [Anonymous], 2006, Computational auditory scene analysis: Principles, algorithms, and applications
  • [2] Barker J, 2015, 2015 IEEE WORKSHOP ON AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING (ASRU), P504, DOI 10.1109/ASRU.2015.7404837
  • [3] Chen Zhuo, 2017, Proc IEEE Int Conf Acoust Speech Signal Process, V2017, P246, DOI 10.1109/ICASSP.2017.7952155
  • [5] Garofolo J. S., 1993, WALL STREET J DATASE
  • [6] Goodfellow I, 2016, ADAPT COMPUT MACH LE, P1
  • [7] Graves A, 2012, STUD COMPUT INTELL, V385, P1, DOI [10.1162/neco.1997.9.1.1, 10.1007/978-3-642-24797-2]
  • [8] Hershey JR, 2016, INT CONF ACOUST SPEE, P31, DOI 10.1109/ICASSP.2016.7471631
  • [9] Super-human multi-talker speech recognition: A graphical modeling approach
    Hershey, John R.
    Rennie, Steven J.
    Olsen, Peder A.
    Kristjansson, Trausti T.
    [J]. COMPUTER SPEECH AND LANGUAGE, 2010, 24 (01) : 45 - 66
  • [10] An Unsupervised Approach to Cochannel Speech Separation
    Hu, Ke
    Wang, DeLiang
    [J]. IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2013, 21 (01): : 120 - 129