A Consolidated Perspective on Multimicrophone Speech Enhancement and Source Separation

被引:437
作者
Gannot, Sharon [1 ]
Vincent, Emmanuel [2 ]
Markovich-Golan, Shmulik [1 ]
Ozerov, Alexey [3 ]
机构
[1] Bar Ilan Univ, IL-5290002 Ramat Gan, Israel
[2] Inria, F-54600 Nancy, France
[3] Technicolor R&D, F-35576 Cesson Sevigne, France
关键词
Array processing; beamforming; expectation-maximization; independent component analysis; multichannel; postfiltering; sparse component analysis; wiener filter; AUDIO SOURCE SEPARATION; BLIND SOURCE SEPARATION; NONNEGATIVE MATRIX FACTORIZATION; INDEPENDENT COMPONENT ANALYSIS; ROOM IMPULSE RESPONSES; NOISE-REDUCTION; FREQUENCY-DOMAIN; CONVOLUTIVE MIXTURES; MICROPHONE ARRAYS; MAXIMUM-LIKELIHOOD;
D O I
10.1109/TASLP.2016.2647702
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Speech enhancement and separation are core problems in audio signal processing, with commercial applications in devices as diverse as mobile phones, conference call systems, hands-free systems, or hearing aids. In addition, they are crucial preprocessing steps for noise-robust automatic speech and speaker recognition. Many devices now have two to eight microphones. The enhancement and separation capabilities offered by these multi-channel interfaces are usually greater than those of single-channel interfaces. Research in speech enhancement and separation has followed two convergent paths, starting with microphone array processing and blind source separation, respectively. These communities are now strongly interrelated and routinely borrow ideas from each other. Yet, a comprehensive overview of the common foundations and the differences between these approaches is lacking at present. In this paper, we propose to fill this gap by analyzing a large number of established and recent techniques according to four transverse axes: 1) the acoustic impulse response model, 2) the spatial filter design criterion, 3) the parameter estimation algorithm, and 4) optional postfiltering. We conclude this overview paper by providing a list of software and data resources and by discussing perspectives and future trends in the field.
引用
收藏
页码:692 / 730
页数:39
相关论文
共 391 条
[1]   Phase-based dual-microphone robust speech enhancement [J].
Aarabi, P ;
Shi, G .
IEEE TRANSACTIONS ON SYSTEMS MAN AND CYBERNETICS PART B-CYBERNETICS, 2004, 34 (04) :1763-1773
[2]  
Abhayapala TD, 2002, INT CONF ACOUST SPEE, P1949
[3]   Variational Bayesian Inference for Source Separation and Robust Feature Extraction [J].
Adiloglu, Kamil ;
Vincent, Emmanuel .
IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2016, 24 (10) :1746-1758
[4]  
Adiloglu K, 2012, INT CONF ACOUST SPEE, P273, DOI 10.1109/ICASSP.2012.6287870
[5]   A signal subspace tracking algorithm for microphone array processing of speech [J].
Affes, S ;
Grenier, Y .
IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, 1997, 5 (05) :425-437
[6]  
Aichner R, 2007, INT CONF ACOUST SPEE, P5
[7]   Blind separation of underdetermined convolutive mixtures using their time-frequency representation [J].
Aissa-El-Bey, Abdeldjalil ;
Abed-Meraim, Karim ;
Grenier, Yves .
IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2007, 15 (05) :1540-1550
[8]  
ALBOUY B, 2003, P 4 INT S IND COMP A, P361
[9]   IMAGE METHOD FOR EFFICIENTLY SIMULATING SMALL-ROOM ACOUSTICS [J].
ALLEN, JB ;
BERKLEY, DA .
JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 1979, 65 (04) :943-950
[10]   Visually Derived Wiener Filters for Speech Enhancement [J].
Almajai, Ibrahim ;
Milner, Ben .
IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2011, 19 (06) :1642-1651