Making Machines Understand Us in Reverberant Rooms

被引：179

作者：

Yoshioka, Takuya ^{[1
]}

Sehr, Armin ^{[2
]}

Delcroix, Marc ^{[1
,3
]}

Kinoshita, Keisuke ^{[1
]}

Maas, Roland ^{[4
]}

Nakatani, Tomohiro ^{[5
]}

Kellermann, Walter ^{[4
,6
]}

机构：

[1] NTT Corp, NTT Commun Sci Labs, Kyoto, Japan

[2] Ericsson Eurolab, Nurnberg, Germany

[3] Pixela Corp, Osaka, Japan

[4] Univ Erlangen Nurnberg, Nurnberg, Germany

[5] NTT Corp, Basic Res Lab, Kyoto, Japan

[6] AT&T Bell Labs, Murray Hill, NJ USA

来源：

IEEE SIGNAL PROCESSING MAGAZINE | 2012年 / 29卷 / 06期

关键词：

CONVOLUTIONAL DISTORTION; MODEL ADAPTATION; SPEECH; DEREVERBERATION; FEATURES; COMPENSATION; SUPPRESSION;

D O I：

10.1109/MSP.2012.2205029

中图分类号：

TM [电工技术]; TN [电子技术、通信技术];

学科分类号：

0808 ; 0809 ;

摘要：

Speech recognition technology has left the research laboratory and is increasingly coming into practical use, enabling a wide spectrum of innovative and exciting voice-driven applications that are radically changing our way of accessing digital services and information. Most of today's applications still require a microphone located near the talker. However, almost all of these applications would benefit from distant-talking speech capturing, where talkers are able to speak at some distance from the microphones without the encumbrance of handheld or body-worn equipment [1]. For example, applications such as meeting speech recognition, automatic annotation of consumer-generated videos, speech-to-speech translation in teleconferencing, and hands-free interfaces for controlling consumer-products, like interactive TV, will greatly benefit from distant-talking operation. Furthermore, for a number of unexplored but important applications, distant microphones are a prerequisite. This means that distant talking speech recognition technology is essential for extending the availability of speech recognizers as well as enhancing the convenience of existing speech recognition applications. © 2012 IEEE.

引用

页码：114 / 126

页数：13

共 48 条

[1] IMAGE METHOD FOR EFFICIENTLY SIMULATING SMALL-ROOM ACOUSTICS [J].

ALLEN, JB ;

BERKLEY, DA .

JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 1979, 65 (04) :943-950

[2]

[Anonymous], 2000, INTERSPEECH, DOI DOI 10.1016/S0167-6393(03)00016-5

[3]

[Anonymous], 2009, Distant Speech Recognition

[4]

Buchner H, 2010, SIGNALS COMMUN TECHN, P311, DOI 10.1007/978-1-84996-056-4_10

[5]

Delcroix M, 2011, ROBUST SPEECH RECOGNITION OF UNCERTAIN OR MISSING DATA: THEORY AND APPLICATIONS, P225, DOI 10.1007/978-3-642-21317-5_9

[6] Dynamic compensation of HMM variances using the feature enhancement uncertainty computed from a parametric model of speech distortion [J].

Deng, L ;

Droppo, J ;

Acero, A .

IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, 2005, 13 (03) :412-421

[7]

Droppo J., 2008, Springer Handbook of Speech Processing, P653

[8] Correlation-Based and Model-Based Blind Single-Channel Late-Reverberation Suppression in Noisy Time-Varying Acoustical Environments [J].

Erkelens, Jan S. ;

Heusdens, Richard .

IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2010, 18 (07) :1746-1765

[9] Modulation Spectral Features for Robust Far-Field Speaker Identification [J].

Falk, Tiago H. ;

Chan, Wai-Yip .

IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2010, 18 (01) :90-100

[10] SPEAKER-INDEPENDENT ISOLATED WORD RECOGNITION USING DYNAMIC FEATURES OF SPEECH SPECTRUM [J].

FURUI, S .

IEEE TRANSACTIONS ON ACOUSTICS SPEECH AND SIGNAL PROCESSING, 1986, 34 (01) :52-59

← 1 2 3 4 5 →