PAN: PHONEME-AWARE NETWORK FOR MONAURAL SPEECH ENHANCEMENT

被引:0
作者
Du, Zhihao [1 ]
Lei, Ming [2 ]
Han, Jiqing [1 ]
Zhang, Shiliang [2 ]
机构
[1] Harbin Inst Technol, Sch Comp Sci & Technol, Harbin, Peoples R China
[2] Alibaba Grp, Machine Intelligence Technol, Hangzhou, Peoples R China
来源
2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING | 2020年
基金
中国国家自然科学基金;
关键词
Monaural speech enhancement; phonetic posteriorgram; phoneme-aware network;
D O I
10.1109/icassp40776.2020.9054334
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Current methods for monaural speech enhancement only utilize acoustic information but seldom consider the phonetic information of an utterance. In the voice conversion community, significant progress has been achieved by using the phonetic information via the phonetic posteriorgrams (PPGs). Inspired by the progress, we propose a phoneme-aware network (PAN) to utilize the noisy PPGs for speech enhancement. Since the PPG prediction and speech enhancement benefit from each other, a PPG predictor is involved into the PAN and an iterative training algorithm is proposed for PAN. Experimental results show that the enhancement performance is improved by using the phonetic information in terms of speech intelligibility, perceptual quality and character error rate. To the best of our knowledge, this is the first time to introduce the PPG into speech enhancement.
引用
收藏
页码:6634 / 6638
页数:5
相关论文
共 29 条
  • [21] Learning Complex Spectral Mapping With Gated Convolutional Recurrent Networks for Monaural Speech Enhancement
    Tan, Ke
    Wang, DeLiang
    IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2020, 28 : 380 - 390
  • [22] IMPROVING ROBUSTNESS OF DEEP LEARNING BASED MONAURAL SPEECH ENHANCEMENT AGAINST PROCESSING ARTIFACTS
    Tan, Ke
    Wang, DeLiang
    2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2020, : 6914 - 6918
  • [23] Joint Optimization of Modified Ideal Radio Mask and Deep Neural Networks for Monaural Speech Enhancement
    Han, Wei
    Wu, Congming
    Zhang, Xiongwei
    Zhang, Qiye
    Bai, Songting
    2017 IEEE 9TH INTERNATIONAL CONFERENCE ON COMMUNICATION SOFTWARE AND NETWORKS (ICCSN), 2017, : 1070 - 1074
  • [24] Supervised monaural speech enhancement using two-level complementary joint sparse representations
    Fu, Jiafei
    Zhang, Long
    Ye, Zhongfu
    APPLIED ACOUSTICS, 2018, 132 : 1 - 7
  • [25] MONAURAL SPEECH ENHANCEMENT BASED ON TWO STAGE LONG SHORT-TERM MEMORY NETWORKS
    Xian, Yang
    Sun, Yang
    Wang, Wenwu
    Naqvi, Syed Mohsen
    2019 13TH INTERNATIONAL CONFERENCE ON SIGNAL PROCESSING AND COMMUNICATION SYSTEMS (ICSPCS), 2019,
  • [26] First coarse, fine afterward: A lightweight two-stage complex approach for monaural speech enhancement
    Dang, Feng
    Chen, Hangting
    Hu, Qi
    Zhang, Pengyuan
    Yan, Yonghong
    SPEECH COMMUNICATION, 2023, 146 : 32 - 44
  • [27] GLD-Net: Improving Monaural Speech Enhancement by Learning Global and Local Dependency Features with GLD Block
    Xu, Xinmeng
    Wang, Yang
    Jia, Jie
    Chen, Binbin
    Hao, Jianjun
    INTERSPEECH 2022, 2022, : 966 - 970
  • [28] Two Heads are Better Than One: A Two-Stage Complex Spectral Mapping Approach for Monaural Speech Enhancement
    Li, Andong
    Liu, Wenzhe
    Zheng, Chengshi
    Fan, Cunhang
    Li, Xiaodong
    IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2021, 29 : 1829 - 1843
  • [29] DeConformer-SENet: An efficient deformable conformer speech enhancement network
    Li, Man
    Liu, Ya
    Zhou, Li
    DIGITAL SIGNAL PROCESSING, 2025, 156