A Subband-Based Stationary-Component Suppression Method Using Harmonics and Power Ratio for Reverberant Speech Recognition

被引：3

作者：

Cho, Byung Joon ^{[1
]}

Kwon, Haeyong ^{[1
]}

Cho, Ji-Won ^{[1
]}

Kim, Chanwoo ^{[2
]}

Stern, Richard M. ^{[3
]}

Park, Hyung-Min ^{[1
]}

机构：

[1] Sogang Univ, Dept Elect Engn, Seoul 04107, South Korea

[2] Google Corp, Mountain View, CA 94043 USA

[3] Carnegie Mellon Univ, Dept Elect & Comp Engn, Pittsburgh, PA 15213 USA

来源：

IEEE SIGNAL PROCESSING LETTERS | 2016年 / 23卷 / 06期

关键词：

Harmonics; precedence effect; reverberation; robust speech recognition;

D O I：

10.1109/LSP.2016.2554888

中图分类号：

TM [电工技术]; TN [电子技术、通信技术];

学科分类号：

0808 ; 0809 ;

摘要：

This letter describes a preprocessing method called subband-based stationary-component suppression method using harmonics and power ratio (SHARP) processing for reverberant speech recognition. SHARP processing extends a previous algorithm called Suppression of Slowly varying components and the Falling edge (SSF), which suppresses the steady-state portions of subband spectral envelopes. The SSF algorithm tends to over-subtract these envelopes in highly reverberant environments when there are high levels of power in previous analysis frames. The proposed SHARP method prevents excessive suppression both by boosting the floor value using the harmonics in voiced speech segments and by inhibiting the subtraction for unvoiced speech by detecting frames in which power is concentrated in high-frequency channels. These modifications enable the SHARP algorithm to improve recognition accuracy by further reducing the mismatch between power contours of clean and reverberated speech. Experimental results indicate that the SHARP method provides better recognition accuracy in highly reverberant environments compared to the SSF algorithm. It is also shown that the performance of the SHARP method can be further improved by combining it with feature-space maximum likelihood linear regression (fMLLR).

引用

页码：780 / 784

页数：5

共 17 条

[1] IMAGE METHOD FOR EFFICIENTLY SIMULATING SMALL-ROOM ACOUSTICS [J].

ALLEN, JB ;

BERKLEY, DA .

JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 1979, 65 (04) :943-950

[2]

Droppo J., 2008, Springer Handbook of Speech Processing, P653

[3] EFFECT OF TEMPORAL ENVELOPE SMEARING ON SPEECH RECEPTION [J].

DRULLMAN, R ;

FESTEN, JM ;

PLOMP, R .

JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 1994, 95 (02) :1053-1064

[4] Maximum likelihood linear transformations for HMM-based speech recognition [J].

Gales, MJF .

COMPUTER SPEECH AND LANGUAGE, 1998, 12 (02) :75-98

[5]

Gillespie BW, 2001, INT CONF ACOUST SPEE, P3701, DOI 10.1109/ICASSP.2001.940646

[6]

HIRSCH HG, 1991, P EUR C SPEECH COMM, P413

[7]

Kim C, 2010, 11TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2010 (INTERSPEECH 2010), VOLS 3 AND 4, P2058

[8]

McGovern S. G., 2013, MATLAB CENTRAL FILE

[9]

Moore BCJ, 1996, ACUSTICA, V82, P335

[10] Robust and accurate fundamental frequency estimation based on dominant harmonic components [J].

Nakatani, T ;

Irino, T .

JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 2004, 116 (06) :3690-3700

← 1 2 →