IMPLICIT ACOUSTIC ECHO CANCELLATION FOR KEYWORD SPOTTING AND DEVICE-DIRECTED SPEECH DETECTION

被引：1

作者：

Cornell, Samuele ^{[1
,2
]}

Balestri, Thomas ^{[2
]}

Senechal, Thibaud ^{[2
]}

机构：

[1] Univ Politecn Marche, Ancona, Italy

[2] Amazon Alexa AI, Seattle, WA USA

来源：

2022 IEEE SPOKEN LANGUAGE TECHNOLOGY WORKSHOP, SLT | 2022年

关键词：

keyword spotting; human computer interaction; front-end processing; speech recognition; acoustic echo cancellation; EFFICIENT BLIND DEREVERBERATION;

D O I：

10.1109/SLT54892.2023.10022358

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

In many speech-enabled human-machine interactions, user speech can overlap with the device playback audio. In these instances, the performance of tasks such as keyword-spotting (KWS) and device-directed speech detection (DDD) can degrade significantly. To address this problem, we propose an implicit acoustic echo cancellation (iAEC) framework where a neural network is trained to exploit the additional information from a reference microphone channel to learn to ignore the interfering signal and improve detection performance. We study this framework for the tasks of KWS and DDD on, respectively, an augmented version of Google Speech Commands v2 and a real-world Alexa device dataset. Notably, we show a 56% reduction in false-reject rate for the DDD task during device playback conditions. We also show comparable or superior performance over a strong end-to-end neural echo cancellation baseline for the KWS task with two order of magnitude less computational requirements.

引用

页码：1052 / 1058

页数：7

共 30 条

[1] Nonlinear residual acoustic echo suppression for high levels of harmonic distortion [J].

Bendersky, Diego A. ;

Stokes, Jack W. ;

Malvar, Henrique S. .

2008 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING, VOLS 1-12, 2008, :261-+

[2]

Benesty J., 2001, Advances in network and acoustic echo cancellation

[3] Detecting and Counting Overlapping Speakers in Distant Speech Scenarios [J].

Cornell, Samuele ;

Omologo, Maurizio ;

Squartini, Stefano ;

Vincent, Emmanuel .

INTERSPEECH 2020, 2020, :3107-3111

[4] gpuRIR: A python']python library for room impulse response simulation with GPU acceleration [J].

Diaz-Guerra, David ;

Miguel, Antonio ;

Beltran, Jose R. .

MULTIMEDIA TOOLS AND APPLICATIONS, 2021, 80 (04) :5653-5671

[5] Deep Multitask Acoustic Echo Cancellation [J].

Fazel, Amin ;

El-Khamy, Mostafa ;

Lee, Jungwon .

INTERSPEECH 2019, 2019, :4250-4254

[6]

Fazel A, 2020, INT CONF ACOUST SPEE, P6919, DOI [10.1109/ICASSP40776.2020.9053508, 10.1109/icassp40776.2020.9053508]

[7]

Gillespie K, 2020, INT CONF ACOUST SPEE, P7859, DOI [10.1109/ICASSP40776.2020.9054304, 10.1109/icassp40776.2020.9054304]

[8] Speech Processing for Digital Home Assistants: Combining signal processing with deep-learning techniques [J].

Haeb-Umbach, Reinhold ;

Watanabe, Shinji ;

Nakatani, Tomohiro ;

Bacchiani, Michiel ;

Hoffmeister, Bjoern ;

Seltzer, Michael L. ;

Zen, Heiga ;

Souden, Mehrez .

IEEE SIGNAL PROCESSING MAGAZINE, 2019, 36 (06) :111-124

[9]

Hansler Eberhard, 2005, Acoustic echo and noise control: a practical approach. vol, V40

[10] A NEURAL ACOUSTIC ECHO CANCELLER OPTIMIZED USING AN AUTOMATIC SPEECH RECOGNIZER AND LARGE SCALE SYNTHETIC DATA [J].

Howard, Nathan ;

Park, Alex ;

Shabestary, Turaj Zakizadeh ;

Gruenstein, Alexander ;

Prabhavalkar, Rohit .

2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, :7128-7132

← 1 2 3 →