SPECTROGRAMS FUSION WITH MINIMUM DIFFERENCE MASKS ESTIMATION FOR MONAURAL SPEECH DEREVERBERATION

被引:0
|
作者
Shi, Hao [1 ]
Wang, Longbiao [1 ]
Ge, Meng [1 ]
Li, Sheng [2 ]
Dang, Jianwu [1 ,3 ]
机构
[1] Tianjin Univ, Coll Intelligence & Comp, Tianjin Key Lab Cognit Comp & Applicat, Tianjin, Peoples R China
[2] Natl Inst Informat & Commun Technol NICT, Kyoto, Japan
[3] Japan Adv Inst Sci & Technol, Nomi, Ishikawa, Japan
来源
2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING | 2020年
基金
国家重点研发计划; 中国国家自然科学基金;
关键词
speech dereverberation; spectrograms fusion; multi-target learning; two-stage; deep learning; ENHANCEMENT;
D O I
10.1109/icassp40776.2020.9054661
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Spectrograms fusion is an effective method for incorporating complementary speech dereverberation systems. Previous linear spectrograms fusion by averaging multiple spectrograms shows outstanding performance. However, various systems with different features cannot apply this simple method. In this study, we design the minimum difference masks (MDMs) to classify the time-frequency (T-F) bins in spectrograms according to the nearest distances from labels. Then, we propose a two-stage nonlinear spectrograms fusion system for speech dereverberation. First, we conduct a multi-target learning-based speech dereverberation front-end model to get spectrograms simultaneously. Then, MDMs are estimated to take the best parts of different spectrograms. We are using spectrograms in the first stage and MDMs in the second stage to recombine T-F bins. The experiments on the REVERB challenge show that a strong feature complementarity between spectrograms and MDMs. Moreover, the proposed framework can consistently and significantly improve PESQ and SRMR, both real and simulated data, e.g., an average PESQ gain of 0.1 in all simulated data and an average SRMR gain of 1.22 in all real data.
引用
收藏
页码:7544 / 7548
页数:5
相关论文
共 24 条
  • [11] Robust Speech Dereverberation Based on Blind Adaptive Estimation of Acoustic Channels
    Haque, Mohammad Ariful
    Islam, Toufiqul
    Hasan, Md Kamrul
    IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2011, 19 (04): : 775 - 787
  • [12] Bifurcation and Reunion: A Loss-Guided Two-Stage Approach for Monaural Speech Dereverberation
    Luo, Xiaoxue
    Zheng, Chengshi
    Li, Andong
    Ke, Yuxuan
    Li, Xiaodong
    INTERSPEECH 2022, 2022, : 2503 - 2507
  • [13] Speech dereverberation based on blind estimation of a reverberation filter
    Zee, Min-Seon
    Park, Hyung-Min
    IEICE ELECTRONICS EXPRESS, 2009, 6 (20): : 1456 - 1461
  • [14] Spectrograms Fusion-based End-to-end Robust Automatic Speech Recognition
    Shi, Hao
    Wang, Longbiao
    Li, Sheng
    Fang, Cunhang
    Dang, Jianwu
    Kawahara, Tatsuya
    2021 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC), 2021, : 438 - 442
  • [15] Utterance-based Speech Dereverberation using Blind Channel Estimation and Multichannel Equalization
    Haque, Mohammad Ariful
    2014 INTERNATIONAL CONFERENCE ON ELECTRICAL AND COMPUTER ENGINEERING (ICECE), 2014, : 274 - 277
  • [16] CAT-DUnet: Enhancing Speech Dereverberation via Feature Fusion and Structural Similarity Loss
    Xiang, Bajian
    Mao, Wenyu
    Tan, Kaijun
    Lu, Huaxiang
    IEEE SIGNAL PROCESSING LETTERS, 2024, 31 : 456 - 460
  • [17] SkipConvGAN: Monaural Speech Dereverberation Using Generative Adversarial Networks via Complex Time-Frequency Masking
    Kothapally, Vinay
    Hansen, John H. L.
    IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2022, 30 : 1600 - 1613
  • [18] A new Genetic Algorithm based fusion scheme in monaural CASA system to improve the performance of the speech
    Shoba, S.
    Rajavel, R.
    JOURNAL OF AMBIENT INTELLIGENCE AND HUMANIZED COMPUTING, 2020, 11 (01) : 433 - 446
  • [19] An Expectation-Maximization Algorithm for Multimicrophone Speech Dereverberation and Noise Reduction With Coherence Matrix Estimation
    Schwartz, Ofer
    Gannot, Sharon
    Habets, Emanuel A. P.
    IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2016, 24 (09) : 1495 - 1510
  • [20] Analysis of trade-offs between magnitude and phase estimation in loss functions for speech denoising and dereverberation
    Luo, Xiaoxue
    Zheng, Chengshi
    Li, Andong
    Ke, Yuxuan
    Li, Xiaodong
    SPEECH COMMUNICATION, 2022, 145 : 71 - 87