Closely coupled array processing and model-based compensation for microphone array speech recognition

被引:12
|
作者
Zhao, Xianyu [1 ]
Ou, Zhijian [1 ]
机构
[1] Tsinghua Univ, Dept Elect Engn, Beijing 100084, Peoples R China
基金
中国国家自然科学基金;
关键词
array signal processing; microphone array; model-based compensation; robust speech recognition;
D O I
10.1109/TASL.2006.881673
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
In conventional microphone array speech recognition, the array processor and the speech recognizer are loosely coupled. The only connection between the two modules is the en hanced target signal output from the array processor, which then gets treated as a single input to. the recognizer. In this approach, useful environmental information, which can be provided by the array processor and also needs to be exploited by the recognizer, is ignored. Inherently, the array processor can generate multiple outputs of spatially filtered signals, as a multi-input-multi-output (MIMO) module. In this paper, a closely coupled approach is proposed, in which a recognizer with model-based noise compensation exploits the reference noise outputs from a MIMO array processor. Specifically, a multichannel model-based noise compensation is presented, including the compensation procedure using the vector Taylor series (VTS) expansion and parameter estimation using the expectation-maximization (EM) algorithm. It is also shown how to construct MIMO array processors from conventional beamformers. A number of practical implementations of the conventional loosely coupled approach and the proposed closely coupled approach were tested on a publicly available database, the Multichannel Overlapping Number Corpus (MONC). Experimental results showed that the proposed closely coupled approach significantly improved the speech recognition performance in the overlapping speech situations.
引用
收藏
页码:1114 / 1122
页数:9
相关论文
共 50 条
  • [31] Speech recognition in cars by speaker localization using microphone array
    Kondo, Keisuke
    Nagai, Takayuki
    Kaneko, Masahide
    Kurematsu, Akira
    Systems and Computers in Japan, 2003, 34 (08) : 1 - 12
  • [32] Model-based feature compensation for robust speech recognition
    Shen, Haifeng
    Li, Qunxia
    Guo, Jun
    Liu, Gang
    FUNDAMENTA INFORMATICAE, 2006, 72 (04) : 529 - 539
  • [33] Model-based feature compensation for robust speech recognition
    School of Information Engineering, Beijing University of Posts and Telecommunications, Beijing, 100876, China
    不详
    不详
    Fundam Inf, 2006, 4 (529-539):
  • [34] Processing of speech signals using a microphone array for intelligent robots
    Hu, I
    Cheng, CC
    Liu, WH
    PROCEEDINGS OF THE INSTITUTION OF MECHANICAL ENGINEERS PART I-JOURNAL OF SYSTEMS AND CONTROL ENGINEERING, 2005, 219 (I2) : 133 - 143
  • [35] A microphone array processing technique for speech enhancement in a reverberant space
    Liu, QG
    Champagne, B
    Kabal, P
    SPEECH COMMUNICATION, 1996, 18 (04) : 317 - 334
  • [36] A signal subspace tracking algorithm for microphone array processing of speech
    Affes, S
    Grenier, Y
    IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, 1997, 5 (05): : 425 - 437
  • [37] Two-channel microphone array processing for speech enhancement
    Yan, ZL
    Du, LM
    Wei, JQ
    Zeng, H
    PROCEEDINGS OF THE 2003 IEEE INTERNATIONAL SYMPOSIUM ON CIRCUITS AND SYSTEMS, VOL II: COMMUNICATIONS-MULTIMEDIA SYSTEMS & APPLICATIONS, 2003, : 548 - 551
  • [38] Calibration, optimization, and DSP implementation of microphone array for speech processing
    Wang, A
    Yao, K
    Hudson, RE
    Korompis, D
    Lorenzelli, F
    Soli, SD
    Gao, S
    VLSI SIGNAL PROCESSING, IX, 1996, : 221 - 230
  • [39] Use of Microphone Array and Model Adaptation for Hands-Free Speech Acquisition and Recognition
    Jen-Tzung Chien
    Jain-Ray Lai
    Journal of VLSI signal processing systems for signal, image and video technology, 2004, 36 : 141 - 151
  • [40] Use of microphone array and model adaptation for hands-free speech acquisition and recognition
    Chien, JT
    Lai, JR
    JOURNAL OF VLSI SIGNAL PROCESSING SYSTEMS FOR SIGNAL IMAGE AND VIDEO TECHNOLOGY, 2004, 36 (2-3): : 141 - 151