PAN: PHONEME-AWARE NETWORK FOR MONAURAL SPEECH ENHANCEMENT

被引:0
作者
Du, Zhihao [1 ]
Lei, Ming [2 ]
Han, Jiqing [1 ]
Zhang, Shiliang [2 ]
机构
[1] Harbin Inst Technol, Sch Comp Sci & Technol, Harbin, Peoples R China
[2] Alibaba Grp, Machine Intelligence Technol, Hangzhou, Peoples R China
来源
2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING | 2020年
基金
中国国家自然科学基金;
关键词
Monaural speech enhancement; phonetic posteriorgram; phoneme-aware network;
D O I
10.1109/icassp40776.2020.9054334
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Current methods for monaural speech enhancement only utilize acoustic information but seldom consider the phonetic information of an utterance. In the voice conversion community, significant progress has been achieved by using the phonetic information via the phonetic posteriorgrams (PPGs). Inspired by the progress, we propose a phoneme-aware network (PAN) to utilize the noisy PPGs for speech enhancement. Since the PPG prediction and speech enhancement benefit from each other, a PPG predictor is involved into the PAN and an iterative training algorithm is proposed for PAN. Experimental results show that the enhancement performance is improved by using the phonetic information in terms of speech intelligibility, perceptual quality and character error rate. To the best of our knowledge, this is the first time to introduce the PPG into speech enhancement.
引用
收藏
页码:6634 / 6638
页数:5
相关论文
共 29 条
  • [11] A novel target decoupling framework based on waveform-spectrum fusion network for monaural speech enhancement
    Yu, Runxiang
    Chen, Wenzhuo
    Ye, Zhongfu
    DIGITAL SIGNAL PROCESSING, 2023, 141
  • [12] Supervised Monaural Speech Enhancement Using Complementary Joint Sparse Representations
    Luo, You
    Bao, Guangzhao
    Xu, Yangfei
    Ye, Zhongfu
    IEEE SIGNAL PROCESSING LETTERS, 2016, 23 (02) : 237 - 241
  • [13] Psychoacoustic model-driven spectral subtraction for monaural speech enhancement
    Upadhyay N.
    International Journal of Speech Technology, 2023, 26 (04) : 963 - 979
  • [14] A Joint Framework of Denoising Autoencoder and Generative Vocoder for Monaural Speech Enhancement
    Du, Zhihao
    Zhang, Xueliang
    Han, Jiqing
    IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2020, 28 : 1493 - 1505
  • [15] Rethinking Complex-Valued Deep Neural Networks for Monaural Speech Enhancement
    Wu, Haibin
    Tan, Ke
    Xu, Buye
    Kumar, Anurag
    Wong, Daniel
    INTERSPEECH 2023, 2023, : 3889 - 3893
  • [16] Online Monaural Speech Enhancement Based on Periodicity Analysis and A Priori SNR Estimation
    Chen, Zhangli
    Hohmann, Volker
    IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2015, 23 (11) : 1904 - 1916
  • [17] Improving Monaural Speech Enhancement by Mapping to Fixed Simulation Space With Knowledge Distillation
    Xu, Xinmeng
    IEEE SIGNAL PROCESSING LETTERS, 2024, 31 : 386 - 390
  • [18] DNN-based monaural speech enhancement with temporal and spectral variations equalization
    Kang, Tae Gyoon
    Shin, Jong Won
    Kim, Nam Soo
    DIGITAL SIGNAL PROCESSING, 2018, 74 : 102 - 110
  • [19] MFT-CRN:Multi-scale Fourier Transform for Monaural Speech Enhancement
    Wang, Yulong
    Zhang, Xueliang
    INTERSPEECH 2023, 2023, : 1060 - 1064
  • [20] JOINT LEARNING WITH SHARED LATENT SPACE FOR SELF-SUPERVISED MONAURAL SPEECH ENHANCEMENT
    Li, Yi
    Sun, Yang
    Wang, Wenwu
    Naqvi, Syed Mohsen
    2023 SENSOR SIGNAL PROCESSING FOR DEFENCE CONFERENCE, SSPD, 2023, : 21 - 25