Speech Enhancement Using Phase-Dependent A Priori SNR Estimator in Log-Mel Spectral Domain

被引:3
作者
Lee, Yun-Kyung [1 ]
Park, Jeon Gue [1 ]
Lee, Yun Keun [1 ]
Kwon, Oh-Wook [2 ]
机构
[1] ETRI, SW Content Res Lab, Taejon, South Korea
[2] Chungbuk Nat Univ, Sch Elect Engn, Cheongju, South Korea
关键词
Phase modeling; speech enhancement; speech separation; decision-directed approach; minimum mean square error estimator; RECOGNITION; NOISE;
D O I
10.4218/etrij.14.2214.0039
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
We propose a novel phase-based method for single-channel speech enhancement to extract and enhance the desired signals in noisy environments by utilizing the phase information. In the method, a phase-dependent a priori signal-to-noise ratio (SNR) is estimated in the log-mel spectral domain to utilize both the magnitude and phase information of input speech signals. The phase-dependent estimator is incorporated into the conventional magnitude-based decision-directed approach that recursively computes the a priori SNR from noisy speech. Additionally, we reduce the performance degradation owing to the one-frame delay of the estimated phase-dependent a priori SNR by using a minimum mean square error (MMESE)-based and maximum a posteriori (MAP)-based estimator. In our speech enhancement experiments, the proposed phase-dependent a priori SNR estimator is shown to improve the output SNR by 2.6 dB for both the MMSE-based and MAP-based estimator cases as compared to a conventional magnitude-based estimator.
引用
收藏
页码:721 / 729
页数:9
相关论文
共 16 条
  • [1] Alam MJ, 2008, INTERSPEECH 2008: 9TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2008, VOLS 1-5, P565
  • [2] Andrassy B., 2001, P EUR C SPEECH COMM, V1, P193
  • [3] An audio-visual corpus for speech perception and automatic speech recognition (L)
    Cooke, Martin
    Barker, Jon
    Cunningham, Stuart
    Shao, Xu
    [J]. JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 2006, 120 (05) : 2421 - 2424
  • [4] Enhancement of log Mel power spectra of speech using a phase-sensitive model of the-acoustic environment and sequential estimation of the corrupting noise
    Deng, L
    Droppo, J
    Acero, A
    [J]. IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, 2004, 12 (02): : 133 - 143
  • [5] SPEECH ENHANCEMENT USING A MINIMUM MEAN-SQUARE ERROR SHORT-TIME SPECTRAL AMPLITUDE ESTIMATOR
    EPHRAIM, Y
    MALAH, D
    [J]. IEEE TRANSACTIONS ON ACOUSTICS SPEECH AND SIGNAL PROCESSING, 1984, 32 (06): : 1109 - 1121
  • [6] FAUBEL F, 2008, P INT SEP, P553
  • [7] Kato M, 2002, IEICE T FUND ELECTR, VE85A, P1710
  • [8] Intra- and Inter-frame Features for Automatic Speech Recognition
    Lee, Sung Joo
    Kang, Byung Ok
    Chung, Hoon
    Lee, Yunkeun
    [J]. ETRI JOURNAL, 2014, 36 (03) : 514 - 517
  • [9] Lee YK, 2011, IEEE INTERNATIONAL CONFERENCE ON CONSUMER ELECTRONICS (ICCE 2011), P413, DOI 10.1109/ICCE.2011.5722657
  • [10] Single-Channel Speech Separation Using Phase-Based Methods
    Lee, Yun-Kyung
    Lee, In Sung
    Kwon, Oh-Wook
    [J]. IEEE TRANSACTIONS ON CONSUMER ELECTRONICS, 2010, 56 (04) : 2453 - 2459