SPECTROGRAMS FUSION WITH MINIMUM DIFFERENCE MASKS ESTIMATION FOR MONAURAL SPEECH DEREVERBERATION

被引:0
|
作者
Shi, Hao [1 ]
Wang, Longbiao [1 ]
Ge, Meng [1 ]
Li, Sheng [2 ]
Dang, Jianwu [1 ,3 ]
机构
[1] Tianjin Univ, Coll Intelligence & Comp, Tianjin Key Lab Cognit Comp & Applicat, Tianjin, Peoples R China
[2] Natl Inst Informat & Commun Technol NICT, Kyoto, Japan
[3] Japan Adv Inst Sci & Technol, Nomi, Ishikawa, Japan
来源
2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING | 2020年
基金
国家重点研发计划; 中国国家自然科学基金;
关键词
speech dereverberation; spectrograms fusion; multi-target learning; two-stage; deep learning; ENHANCEMENT;
D O I
10.1109/icassp40776.2020.9054661
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Spectrograms fusion is an effective method for incorporating complementary speech dereverberation systems. Previous linear spectrograms fusion by averaging multiple spectrograms shows outstanding performance. However, various systems with different features cannot apply this simple method. In this study, we design the minimum difference masks (MDMs) to classify the time-frequency (T-F) bins in spectrograms according to the nearest distances from labels. Then, we propose a two-stage nonlinear spectrograms fusion system for speech dereverberation. First, we conduct a multi-target learning-based speech dereverberation front-end model to get spectrograms simultaneously. Then, MDMs are estimated to take the best parts of different spectrograms. We are using spectrograms in the first stage and MDMs in the second stage to recombine T-F bins. The experiments on the REVERB challenge show that a strong feature complementarity between spectrograms and MDMs. Moreover, the proposed framework can consistently and significantly improve PESQ and SRMR, both real and simulated data, e.g., an average PESQ gain of 0.1 in all simulated data and an average SRMR gain of 1.22 in all real data.
引用
收藏
页码:7544 / 7548
页数:5
相关论文
共 24 条
  • [1] On the importance of power compression and phase estimation in monaural speech dereverberation
    Li, Andong
    Zheng, Chengshi
    Peng, Renhua
    Li, Xiaodong
    JASA EXPRESS LETTERS, 2021, 1 (01):
  • [2] Convolutive Prediction for Monaural Speech Dereverberation and Noisy-Reverberant Speaker Separation
    Wang, Zhong-Qiu
    Wichern, Gordon
    Le Roux, Jonathan
    IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2021, 29 : 3476 - 3490
  • [3] An Overview of Monaural Speech Denoising and Dereverberation Research
    Lan T.
    Peng C.
    Li S.
    Ye W.
    Li M.
    Hui G.
    Lü Y.
    Qian Y.
    Liu Q.
    Liu, Qiao (qliu@uestc.edu.cn), 1600, Science Press (57): : 928 - 953
  • [4] Multi-Resolution Convolutional Residual Neural Networks for Monaural Speech Dereverberation
    Zhao, Lei
    Zhu, Wenbo
    Li, Shengqiang
    Luo, Hong
    Zhang, Xiao-Lei
    Rahardja, Susanto
    IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2024, 32 : 2338 - 2351
  • [5] Monaural Speech Dereverberation Using Deformable Convolutional Networks
    Kothapally, Vinay
    Hansen, John H. L.
    IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2024, 32 : 1712 - 1723
  • [6] ROBUST SPEECH DEREVERBERATION BASED ON NON-NEGATIVITY AND SPARSE NATURE OF SPEECH SPECTROGRAMS
    Kameoka, Hirokazu
    Nakatani, Tomohiro
    Yoshioka, Takuya
    2009 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS 1- 8, PROCEEDINGS, 2009, : 45 - 48
  • [7] Receptive Field Analysis of Temporal Convolutional Networks for Monaural Speech Dereverberation
    Ravenscroft, William
    Goetze, Stefan
    Hain, Thomas
    2022 30TH EUROPEAN SIGNAL PROCESSING CONFERENCE (EUSIPCO 2022), 2022, : 80 - 84
  • [8] Monaural Speech Separation Based on Gain Adapted Minimum Mean Square Error Estimation
    Radfar, M. H.
    Dansereau, R. M.
    Chan, W-Y.
    JOURNAL OF SIGNAL PROCESSING SYSTEMS FOR SIGNAL IMAGE AND VIDEO TECHNOLOGY, 2010, 61 (01): : 21 - 37
  • [9] Modulation-Domain Kalman Filtering for Monaural Blind Speech Denoising and Dereverberation
    Dioneli, Nikolaos
    Brookes, Mike
    IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2019, 27 (04) : 799 - 814
  • [10] UTTERANCE WEIGHTED MULTI-DILATION TEMPORAL CONVOLUTIONAL NETWORKS FOR MONAURAL SPEECH DEREVERBERATION
    Ravenscroft, William
    Goetze, Stefan
    Hain, Thomas
    2022 INTERNATIONAL WORKSHOP ON ACOUSTIC SIGNAL ENHANCEMENT (IWAENC 2022), 2022,