PAN: PHONEME-AWARE NETWORK FOR MONAURAL SPEECH ENHANCEMENT

被引：0

作者：

Du, Zhihao ^{[1
]}

Lei, Ming ^{[2
]}

Han, Jiqing ^{[1
]}

Zhang, Shiliang ^{[2
]}

机构：

[1] Harbin Inst Technol, Sch Comp Sci & Technol, Harbin, Peoples R China

[2] Alibaba Grp, Machine Intelligence Technol, Hangzhou, Peoples R China

来源：

2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING | 2020年

基金：

中国国家自然科学基金;

关键词：

Monaural speech enhancement; phonetic posteriorgram; phoneme-aware network;

D O I：

10.1109/icassp40776.2020.9054334

中图分类号：

O42 [声学];

学科分类号：

070206 ; 082403 ;

摘要：

Current methods for monaural speech enhancement only utilize acoustic information but seldom consider the phonetic information of an utterance. In the voice conversion community, significant progress has been achieved by using the phonetic information via the phonetic posteriorgrams (PPGs). Inspired by the progress, we propose a phoneme-aware network (PAN) to utilize the noisy PPGs for speech enhancement. Since the PPG prediction and speech enhancement benefit from each other, a PPG predictor is involved into the PAN and an iterative training algorithm is proposed for PAN. Experimental results show that the enhancement performance is improved by using the phonetic information in terms of speech intelligibility, perceptual quality and character error rate. To the best of our knowledge, this is the first time to introduce the PPG into speech enhancement.

引用

页码：6634 / 6638

页数：5

共 29 条

[21] Learning Complex Spectral Mapping With Gated Convolutional Recurrent Networks for Monaural Speech Enhancement
Tan, Ke
Wang, DeLiang
IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2020, 28 : 380 - 390
[22] IMPROVING ROBUSTNESS OF DEEP LEARNING BASED MONAURAL SPEECH ENHANCEMENT AGAINST PROCESSING ARTIFACTS
Tan, Ke
Wang, DeLiang
2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2020, : 6914 - 6918
[23] Joint Optimization of Modified Ideal Radio Mask and Deep Neural Networks for Monaural Speech Enhancement
Han, Wei
Wu, Congming
Zhang, Xiongwei
Zhang, Qiye
Bai, Songting
2017 IEEE 9TH INTERNATIONAL CONFERENCE ON COMMUNICATION SOFTWARE AND NETWORKS (ICCSN), 2017, : 1070 - 1074
[24] Supervised monaural speech enhancement using two-level complementary joint sparse representations
Fu, Jiafei
Zhang, Long
Ye, Zhongfu
APPLIED ACOUSTICS, 2018, 132 : 1 - 7
[25] MONAURAL SPEECH ENHANCEMENT BASED ON TWO STAGE LONG SHORT-TERM MEMORY NETWORKS
Xian, Yang
Sun, Yang
Wang, Wenwu
Naqvi, Syed Mohsen
2019 13TH INTERNATIONAL CONFERENCE ON SIGNAL PROCESSING AND COMMUNICATION SYSTEMS (ICSPCS), 2019,
[26] First coarse, fine afterward: A lightweight two-stage complex approach for monaural speech enhancement
Dang, Feng
Chen, Hangting
Hu, Qi
Zhang, Pengyuan
Yan, Yonghong
SPEECH COMMUNICATION, 2023, 146 : 32 - 44
[27] GLD-Net: Improving Monaural Speech Enhancement by Learning Global and Local Dependency Features with GLD Block
Xu, Xinmeng
Wang, Yang
Jia, Jie
Chen, Binbin
Hao, Jianjun
INTERSPEECH 2022, 2022, : 966 - 970
[28] Two Heads are Better Than One: A Two-Stage Complex Spectral Mapping Approach for Monaural Speech Enhancement
Li, Andong
Liu, Wenzhe
Zheng, Chengshi
Fan, Cunhang
Li, Xiaodong
IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2021, 29 : 1829 - 1843
[29] DeConformer-SENet: An efficient deformable conformer speech enhancement network
Li, Man
Liu, Ya
Zhou, Li
DIGITAL SIGNAL PROCESSING, 2025, 156

← 1 2 3 →