Deep learning-based stereophonic acoustic echo suppression without decorrelation

被引:11
作者
Cheng, Linjuan [1 ]
Peng, Renhua [1 ]
Li, Andong [1 ]
Zheng, Chengshi [1 ]
Li, Xiaodong [1 ]
机构
[1] Chinese Acad Sci, Inst Acoust, Key Lab Noise & Vibrat Res, Beijing 100190, Peoples R China
关键词
ADAPTIVE FILTERING ALGORITHM; CANCELLATION;
D O I
10.1121/10.0005757
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Traditional stereophonic acoustic echo cancellation algorithms need to estimate acoustic echo paths from stereo loudspeakers to a microphone, which often suffers from the nonuniqueness problem caused by a high correlation between the two far-end signals of these stereo loudspeakers. Many decorrelation methods have already been proposed to mitigate this problem. However, these methods may reduce the audio quality and/or stereophonic spatial perception. This paper proposes to use a convolutional recurrent network (CRN) to suppress the stereophonic echo components by estimating a nonlinear gain, which is then multiplied by the complex spectrum of the microphone signal to obtain the estimated near-end speech without a decorrelation procedure. The CRN includes an encoder-decoder module and two-layer gated recurrent network module, which can take advantage of the feature extraction capability of the convolutional neural networks and temporal modeling capability of recurrent neural networks simultaneously. The magnitude spectra of the two far-end signals are used as input features directly without any decorrelation preprocessing and, thus, both the audio quality and stereophonic spatial perception can be maintained. The experimental results in both the simulated and real acoustic environments show that the proposed algorithm outperforms traditional algorithms such as the normalized least-mean square and Wiener algorithms, especially in situations of low signal-to-echo ratio and high reverberation time RT60. (C) 2021 Author(s). All article content, except where otherwise noted, is licensed under a Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).
引用
收藏
页码:816 / 829
页数:14
相关论文
共 47 条
[1]   IMAGE METHOD FOR EFFICIENTLY SIMULATING SMALL-ROOM ACOUSTICS [J].
ALLEN, JB ;
BERKLEY, DA .
JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 1979, 65 (04) :943-950
[2]  
Amand F., 1996, Signal Processing VIII, Theories and Applications. Proceedings of EUSIPCO-96, Eighth European Signal Processing Conference, P1119
[3]  
[Anonymous], 2015, ABS151008484 CORR
[4]   SegNet: A Deep Convolutional Encoder-Decoder Architecture for Image Segmentation [J].
Badrinarayanan, Vijay ;
Kendall, Alex ;
Cipolla, Roberto .
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2017, 39 (12) :2481-2495
[5]  
Bando Y, 2020, ISCA INTERSPEECH, P2437
[6]  
Bando Y, 2018, 2018 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), P716, DOI 10.1109/ICASSP.2018.8461530
[7]   A fast convergence normalized least-mean-square type algorithm for adaptive filtering [J].
Benallal, A. ;
Arezki, M. .
INTERNATIONAL JOURNAL OF ADAPTIVE CONTROL AND SIGNAL PROCESSING, 2014, 28 (10) :1073-1080
[8]   A frequency domain stereophonic acoustic echo canceler exploiting the coherence between the channels [J].
Benesty, J ;
Gilloire, A .
JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 1999, 106 (03) :L30-L35
[9]  
Benesty J, 1998, INT CONF ACOUST SPEE, P3673, DOI 10.1109/ICASSP.1998.679680
[10]   A better understanding and an improved solution to the specific problems of stereophonic acoustic echo cancellation [J].
Benesty, J ;
Morgan, DR ;
Sondhi, MM .
IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, 1998, 6 (02) :156-165