Microphone Array Processing for Distant Speech Recognition

被引:85
作者
Kumatani, Kenichi [1 ,2 ,3 ]
McDonough, John [2 ,4 ]
Raj, Bhiksha [5 ]
机构
[1] Disney Res, Pittsburgh, PA USA
[2] Univ Karlsruhe, Karlsruhe, Germany
[3] Idiap Res Inst, European Union Projects Comp Human Interact Loop, Martigniy, Switzerland
[4] Univ Saarland, Saarbrucken, Germany
[5] Harvard Univ, Extens Sch, Cambridge, MA 02138 USA
关键词
DESIGN;
D O I
10.1109/MSP.2012.2205285
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
Distant speech recognition (DSR) holds the promise of the most natural human computer interface because it enables man-machine interactions through speech, without the necessity of donning intrusive body- or head-mounted microphones. Recognizing distant speech robustly, however, remains a challenge. This contribution provides a tutorial overview of DSR systems based on microphone arrays. In particular, we present recent work on acoustic beam forming for DSR, along with experimental results verifying the effectiveness of the various algorithms described here; beginning from a word error rate (WER) of 14.3% with a single microphone of a linear array, our state-of-the-art DSR system achieved a WER of 5.3%, which was comparable to that of 4.2% obtained with a lapel microphone. Moreover, we present an emerging technology in the area of far-field audio and speech processing based on spherical microphone arrays. Performance comparisons of spherical and linear arrays reveal that a spherical array with a diameter of 8.4 cm can provide recognition accuracy comparable or better than that obtained with a large linear array with an aperture length of 126 cm. © 2012 IEEE.
引用
收藏
页码:127 / 140
页数:14
相关论文
共 42 条
[1]  
[Anonymous], 2007, MODAL ARRAY SIGNAL P
[2]  
[Anonymous], 2010, Handbook of Mathematical Functions
[3]  
[Anonymous], P MLMI
[4]  
[Anonymous], P MLMI
[5]  
[Anonymous], 2009, Distant Speech Recognition
[6]  
Astudillo RF, 2012, INT CONF ACOUST SPEE, P4909, DOI 10.1109/ICASSP.2012.6289020
[7]  
Brandstein M, 2001, DIGITAL SIGNAL PROC, P133
[8]  
Brutti Alessio, 2008, 2008 Hands-Free Speech Communication and Microphone Arrays (HSCMA '08), P69, DOI 10.1109/HSCMA.2008.4538690
[9]   Time delay estimation in room acoustic environments: An overview [J].
Chen, Jingdong ;
Benesty, Jacob ;
Huang, Yiteng Arden .
EURASIP JOURNAL ON APPLIED SIGNAL PROCESSING, 2006, 2006 (1)
[10]  
Christensen H, 2010, 11TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2010 (INTERSPEECH 2010), VOLS 3 AND 4, P1918