IMPLICIT ACOUSTIC ECHO CANCELLATION FOR KEYWORD SPOTTING AND DEVICE-DIRECTED SPEECH DETECTION

被引:1
作者
Cornell, Samuele [1 ,2 ]
Balestri, Thomas [2 ]
Senechal, Thibaud [2 ]
机构
[1] Univ Politecn Marche, Ancona, Italy
[2] Amazon Alexa AI, Seattle, WA USA
来源
2022 IEEE SPOKEN LANGUAGE TECHNOLOGY WORKSHOP, SLT | 2022年
关键词
keyword spotting; human computer interaction; front-end processing; speech recognition; acoustic echo cancellation; EFFICIENT BLIND DEREVERBERATION;
D O I
10.1109/SLT54892.2023.10022358
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In many speech-enabled human-machine interactions, user speech can overlap with the device playback audio. In these instances, the performance of tasks such as keyword-spotting (KWS) and device-directed speech detection (DDD) can degrade significantly. To address this problem, we propose an implicit acoustic echo cancellation (iAEC) framework where a neural network is trained to exploit the additional information from a reference microphone channel to learn to ignore the interfering signal and improve detection performance. We study this framework for the tasks of KWS and DDD on, respectively, an augmented version of Google Speech Commands v2 and a real-world Alexa device dataset. Notably, we show a 56% reduction in false-reject rate for the DDD task during device playback conditions. We also show comparable or superior performance over a strong end-to-end neural echo cancellation baseline for the KWS task with two order of magnitude less computational requirements.
引用
收藏
页码:1052 / 1058
页数:7
相关论文
共 30 条
[1]   Nonlinear residual acoustic echo suppression for high levels of harmonic distortion [J].
Bendersky, Diego A. ;
Stokes, Jack W. ;
Malvar, Henrique S. .
2008 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING, VOLS 1-12, 2008, :261-+
[2]  
Benesty J., 2001, Advances in network and acoustic echo cancellation
[3]   Detecting and Counting Overlapping Speakers in Distant Speech Scenarios [J].
Cornell, Samuele ;
Omologo, Maurizio ;
Squartini, Stefano ;
Vincent, Emmanuel .
INTERSPEECH 2020, 2020, :3107-3111
[4]   gpuRIR: A python']python library for room impulse response simulation with GPU acceleration [J].
Diaz-Guerra, David ;
Miguel, Antonio ;
Beltran, Jose R. .
MULTIMEDIA TOOLS AND APPLICATIONS, 2021, 80 (04) :5653-5671
[5]   Deep Multitask Acoustic Echo Cancellation [J].
Fazel, Amin ;
El-Khamy, Mostafa ;
Lee, Jungwon .
INTERSPEECH 2019, 2019, :4250-4254
[6]  
Fazel A, 2020, INT CONF ACOUST SPEE, P6919, DOI [10.1109/ICASSP40776.2020.9053508, 10.1109/icassp40776.2020.9053508]
[7]  
Gillespie K, 2020, INT CONF ACOUST SPEE, P7859, DOI [10.1109/ICASSP40776.2020.9054304, 10.1109/icassp40776.2020.9054304]
[8]   Speech Processing for Digital Home Assistants: Combining signal processing with deep-learning techniques [J].
Haeb-Umbach, Reinhold ;
Watanabe, Shinji ;
Nakatani, Tomohiro ;
Bacchiani, Michiel ;
Hoffmeister, Bjoern ;
Seltzer, Michael L. ;
Zen, Heiga ;
Souden, Mehrez .
IEEE SIGNAL PROCESSING MAGAZINE, 2019, 36 (06) :111-124
[9]  
Hansler Eberhard, 2005, Acoustic echo and noise control: a practical approach. vol, V40
[10]   A NEURAL ACOUSTIC ECHO CANCELLER OPTIMIZED USING AN AUTOMATIC SPEECH RECOGNIZER AND LARGE SCALE SYNTHETIC DATA [J].
Howard, Nathan ;
Park, Alex ;
Shabestary, Turaj Zakizadeh ;
Gruenstein, Alexander ;
Prabhavalkar, Rohit .
2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, :7128-7132