SPECTROGRAMS FUSION WITH MINIMUM DIFFERENCE MASKS ESTIMATION FOR MONAURAL SPEECH DEREVERBERATION

被引:0
作者
Shi, Hao [1 ]
Wang, Longbiao [1 ]
Ge, Meng [1 ]
Li, Sheng [2 ]
Dang, Jianwu [1 ,3 ]
机构
[1] Tianjin Univ, Coll Intelligence & Comp, Tianjin Key Lab Cognit Comp & Applicat, Tianjin, Peoples R China
[2] Natl Inst Informat & Commun Technol NICT, Kyoto, Japan
[3] Japan Adv Inst Sci & Technol, Nomi, Ishikawa, Japan
来源
2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING | 2020年
基金
中国国家自然科学基金; 国家重点研发计划;
关键词
speech dereverberation; spectrograms fusion; multi-target learning; two-stage; deep learning; ENHANCEMENT;
D O I
10.1109/icassp40776.2020.9054661
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Spectrograms fusion is an effective method for incorporating complementary speech dereverberation systems. Previous linear spectrograms fusion by averaging multiple spectrograms shows outstanding performance. However, various systems with different features cannot apply this simple method. In this study, we design the minimum difference masks (MDMs) to classify the time-frequency (T-F) bins in spectrograms according to the nearest distances from labels. Then, we propose a two-stage nonlinear spectrograms fusion system for speech dereverberation. First, we conduct a multi-target learning-based speech dereverberation front-end model to get spectrograms simultaneously. Then, MDMs are estimated to take the best parts of different spectrograms. We are using spectrograms in the first stage and MDMs in the second stage to recombine T-F bins. The experiments on the REVERB challenge show that a strong feature complementarity between spectrograms and MDMs. Moreover, the proposed framework can consistently and significantly improve PESQ and SRMR, both real and simulated data, e.g., an average PESQ gain of 0.1 in all simulated data and an average SRMR gain of 1.22 in all real data.
引用
收藏
页码:7544 / 7548
页数:5
相关论文
共 24 条
  • [21] Improved Noise Minimum Statistics Estimation Algorithm for Using in a Speech-Passing Noise-Rejecting Headset
    Seyedtabaee, Saeed
    Goodarzi, Hamze Moazami
    EURASIP JOURNAL ON ADVANCES IN SIGNAL PROCESSING, 2010,
  • [22] Speech Prior Estimation for Generalized Minimum Mean-Square Error Short-Time Spectral Amplitude Estimator
    Wakisaka, Ryo
    Saruwatari, Hiroshi
    Shikano, Kiyohiro
    Takatani, Tomoya
    IEICE TRANSACTIONS ON FUNDAMENTALS OF ELECTRONICS COMMUNICATIONS AND COMPUTER SCIENCES, 2012, E95A (02) : 591 - 595
  • [23] Blind speech prior estimation for generalized minimum mean-square error short-time spectral amplitude estimator
    Wakisaka, Ryo
    Saruwatari, Hiroshi
    Shikano, Kiyohiro
    Takatani, Tomoya
    12TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2011 (INTERSPEECH 2011), VOLS 1-5, 2011, : 368 - 371
  • [24] FusedF0: Improving DNN-based F0 Estimation by Fusion of Summary-Correlograms and Raw Waveform Representations of Speech Signals
    Eren, Eray
    Tan, Lee Ngee
    Alwan, Abeer
    INTERSPEECH 2023, 2023, : 4523 - 4527