Speech Enhancement Using Phase-Dependent A Priori SNR Estimator in Log-Mel Spectral Domain

被引：3

作者：

Lee, Yun-Kyung ^{[1
]}

Park, Jeon Gue ^{[1
]}

Lee, Yun Keun ^{[1
]}

Kwon, Oh-Wook ^{[2
]}

机构：

[1] ETRI, SW Content Res Lab, Taejon, South Korea

[2] Chungbuk Nat Univ, Sch Elect Engn, Cheongju, South Korea

来源：

ETRI JOURNAL | 2014年 / 36卷 / 05期

关键词：

Phase modeling; speech enhancement; speech separation; decision-directed approach; minimum mean square error estimator; RECOGNITION; NOISE;

D O I：

10.4218/etrij.14.2214.0039

中图分类号：

TM [电工技术]; TN [电子技术、通信技术];

学科分类号：

0808 ; 0809 ;

摘要：

We propose a novel phase-based method for single-channel speech enhancement to extract and enhance the desired signals in noisy environments by utilizing the phase information. In the method, a phase-dependent a priori signal-to-noise ratio (SNR) is estimated in the log-mel spectral domain to utilize both the magnitude and phase information of input speech signals. The phase-dependent estimator is incorporated into the conventional magnitude-based decision-directed approach that recursively computes the a priori SNR from noisy speech. Additionally, we reduce the performance degradation owing to the one-frame delay of the estimated phase-dependent a priori SNR by using a minimum mean square error (MMESE)-based and maximum a posteriori (MAP)-based estimator. In our speech enhancement experiments, the proposed phase-dependent a priori SNR estimator is shown to improve the output SNR by 2.6 dB for both the MMSE-based and MAP-based estimator cases as compared to a conventional magnitude-based estimator.

引用

页码：721 / 729

页数：9

共 16 条

[1] Alam MJ, 2008, INTERSPEECH 2008: 9TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2008, VOLS 1-5, P565
[2] Andrassy B., 2001, P EUR C SPEECH COMM, V1, P193
[3] An audio-visual corpus for speech perception and automatic speech recognition (L)
Cooke, Martin
Barker, Jon
Cunningham, Stuart
Shao, Xu
[J]. JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 2006, 120 (05) : 2421 - 2424
[4] Enhancement of log Mel power spectra of speech using a phase-sensitive model of the-acoustic environment and sequential estimation of the corrupting noise
Deng, L
Droppo, J
Acero, A
[J]. IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, 2004, 12 (02): : 133 - 143
[5] SPEECH ENHANCEMENT USING A MINIMUM MEAN-SQUARE ERROR SHORT-TIME SPECTRAL AMPLITUDE ESTIMATOR
EPHRAIM, Y
MALAH, D
[J]. IEEE TRANSACTIONS ON ACOUSTICS SPEECH AND SIGNAL PROCESSING, 1984, 32 (06): : 1109 - 1121
[6] FAUBEL F, 2008, P INT SEP, P553
[7] Kato M, 2002, IEICE T FUND ELECTR, VE85A, P1710
[8] Intra- and Inter-frame Features for Automatic Speech Recognition
Lee, Sung Joo
Kang, Byung Ok
Chung, Hoon
Lee, Yunkeun
[J]. ETRI JOURNAL, 2014, 36 (03) : 514 - 517
[9] Lee YK, 2011, IEEE INTERNATIONAL CONFERENCE ON CONSUMER ELECTRONICS (ICCE 2011), P413, DOI 10.1109/ICCE.2011.5722657
[10] Single-Channel Speech Separation Using Phase-Based Methods
Lee, Yun-Kyung
Lee, In Sung
Kwon, Oh-Wook
[J]. IEEE TRANSACTIONS ON CONSUMER ELECTRONICS, 2010, 56 (04) : 2453 - 2459

← 1 2 →