Closely coupled array processing and model-based compensation for microphone array speech recognition

被引：12

作者：

Zhao, Xianyu ^{[1
]}

Ou, Zhijian ^{[1
]}

机构：

[1] Tsinghua Univ, Dept Elect Engn, Beijing 100084, Peoples R China

来源：

IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING | 2007年 / 15卷 / 03期

基金：

中国国家自然科学基金;

关键词：

array signal processing; microphone array; model-based compensation; robust speech recognition;

D O I：

10.1109/TASL.2006.881673

中图分类号：

O42 [声学];

学科分类号：

070206 ; 082403 ;

摘要：

In conventional microphone array speech recognition, the array processor and the speech recognizer are loosely coupled. The only connection between the two modules is the en hanced target signal output from the array processor, which then gets treated as a single input to. the recognizer. In this approach, useful environmental information, which can be provided by the array processor and also needs to be exploited by the recognizer, is ignored. Inherently, the array processor can generate multiple outputs of spatially filtered signals, as a multi-input-multi-output (MIMO) module. In this paper, a closely coupled approach is proposed, in which a recognizer with model-based noise compensation exploits the reference noise outputs from a MIMO array processor. Specifically, a multichannel model-based noise compensation is presented, including the compensation procedure using the vector Taylor series (VTS) expansion and parameter estimation using the expectation-maximization (EM) algorithm. It is also shown how to construct MIMO array processors from conventional beamformers. A number of practical implementations of the conventional loosely coupled approach and the proposed closely coupled approach were tested on a publicly available database, the Multichannel Overlapping Number Corpus (MONC). Experimental results showed that the proposed closely coupled approach significantly improved the speech recognition performance in the overlapping speech situations.

引用

页码：1114 / 1122

页数：9

共 50 条

[1] Closely coupled array processing and model-based compensation for microphone array speech recognition
Zhao, XY
Ou, ZJ
Che, MH
Wang, ZY
2005 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS 1-5: SPEECH PROCESSING, 2005, : 417 - 420
[2] Microphone Array Processing for Distant Speech Recognition
Kumatani, Kenichi
McDonough, John
Raj, Bhiksha
IEEE SIGNAL PROCESSING MAGAZINE, 2012, 29 (06) : 127 - 140
[3] Model-Based Post Filter for Microphone Array Speech Enhancement
Xiong, Yan
Chen, Qiang
Deng, Shuxia
Liang, Sheng
Wang, Kailian
Zhang, Jun
Wang, Jie
2018 7TH INTERNATIONAL CONFERENCE ON DIGITAL HOME (ICDH 2018), 2018, : 82 - 88
[4] Microphone Array Speech Processing
Nordholm, Sven
Abhayapala, Thushara
Doclo, Simon
Gannot, Sharon
Naylor, Patrick
Tashev, Ivan
EURASIP JOURNAL ON ADVANCES IN SIGNAL PROCESSING, 2010,
[5] Microphone Array Speech Processing
Sven Nordholm
ThusharaD Abhayapala
Simon Doclo
Sharon Gannot
P Naylor
Ivan Tashev
EURASIP Journal on Advances in Signal Processing, 2010
[6] Microphone Array Processing for Distant Speech Recognition: Spherical Arrays
McDonough, John
Kumatani, Kenichi
Raj, Bhiksha
2012 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC), 2012,
[7] Microphone Array Processing Strategies for Distant-Based Automatic Speech Recognition
Khoubrouy, Soudeh A.
Hansen, John H. L.
IEEE SIGNAL PROCESSING LETTERS, 2016, 23 (10) : 1344 - 1348
[8] HMM adaptation and microphone array processing for distant speech recognition
Kleban, J
Gong, YF
2000 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, PROCEEDINGS, VOLS I-VI, 2000, : 1411 - 1414
[9] A Posterior Approach for Microphone Array Based Speech Recognition
Wang, Dong
Himawan, Ivan
Frankel, Joe
King, Simon
INTERSPEECH 2008: 9TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2008, VOLS 1-5, 2008, : 996 - 999
[10] Microphone array system for speech recognition
Kiyohara, K
Kaneda, Y
Takahashi, S
Nomura, H
Kojima, J
1997 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS I - V: VOL I: PLENARY, EXPERT SUMMARIES, SPECIAL, AUDIO, UNDERWATER ACOUSTICS, VLSI; VOL II: SPEECH PROCESSING; VOL III: SPEECH PROCESSING, DIGITAL SIGNAL PROCESSING; VOL IV: MULTIDIMENSIONAL SIGNAL PROCESSING, NEURAL NETWORKS, 1997, : 215 - 218

← 1 2 3 4 5 →