PAN: PHONEME-AWARE NETWORK FOR MONAURAL SPEECH ENHANCEMENT

被引：0

作者：

Du, Zhihao ^{[1
]}

Lei, Ming ^{[2
]}

Han, Jiqing ^{[1
]}

Zhang, Shiliang ^{[2
]}

机构：

[1] Harbin Inst Technol, Sch Comp Sci & Technol, Harbin, Peoples R China

[2] Alibaba Grp, Machine Intelligence Technol, Hangzhou, Peoples R China

来源：

2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING | 2020年

基金：

中国国家自然科学基金;

关键词：

Monaural speech enhancement; phonetic posteriorgram; phoneme-aware network;

D O I：

10.1109/icassp40776.2020.9054334

中图分类号：

O42 [声学];

学科分类号：

070206 ; 082403 ;

摘要：

Current methods for monaural speech enhancement only utilize acoustic information but seldom consider the phonetic information of an utterance. In the voice conversion community, significant progress has been achieved by using the phonetic information via the phonetic posteriorgrams (PPGs). Inspired by the progress, we propose a phoneme-aware network (PAN) to utilize the noisy PPGs for speech enhancement. Since the PPG prediction and speech enhancement benefit from each other, a PPG predictor is involved into the PAN and an iterative training algorithm is proposed for PAN. Experimental results show that the enhancement performance is improved by using the phonetic information in terms of speech intelligibility, perceptual quality and character error rate. To the best of our knowledge, this is the first time to introduce the PPG into speech enhancement.

引用

页码：6634 / 6638

页数：5

共 29 条

[11] A novel target decoupling framework based on waveform-spectrum fusion network for monaural speech enhancement
Yu, Runxiang
Chen, Wenzhuo
Ye, Zhongfu
DIGITAL SIGNAL PROCESSING, 2023, 141
[12] Supervised Monaural Speech Enhancement Using Complementary Joint Sparse Representations
Luo, You
Bao, Guangzhao
Xu, Yangfei
Ye, Zhongfu
IEEE SIGNAL PROCESSING LETTERS, 2016, 23 (02) : 237 - 241
[13] Psychoacoustic model-driven spectral subtraction for monaural speech enhancement
Upadhyay N.
International Journal of Speech Technology, 2023, 26 (04) : 963 - 979
[14] A Joint Framework of Denoising Autoencoder and Generative Vocoder for Monaural Speech Enhancement
Du, Zhihao
Zhang, Xueliang
Han, Jiqing
IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2020, 28 : 1493 - 1505
[15] Rethinking Complex-Valued Deep Neural Networks for Monaural Speech Enhancement
Wu, Haibin
Tan, Ke
Xu, Buye
Kumar, Anurag
Wong, Daniel
INTERSPEECH 2023, 2023, : 3889 - 3893
[16] Online Monaural Speech Enhancement Based on Periodicity Analysis and A Priori SNR Estimation
Chen, Zhangli
Hohmann, Volker
IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2015, 23 (11) : 1904 - 1916
[17] Improving Monaural Speech Enhancement by Mapping to Fixed Simulation Space With Knowledge Distillation
Xu, Xinmeng
IEEE SIGNAL PROCESSING LETTERS, 2024, 31 : 386 - 390
[18] DNN-based monaural speech enhancement with temporal and spectral variations equalization
Kang, Tae Gyoon
Shin, Jong Won
Kim, Nam Soo
DIGITAL SIGNAL PROCESSING, 2018, 74 : 102 - 110
[19] MFT-CRN:Multi-scale Fourier Transform for Monaural Speech Enhancement
Wang, Yulong
Zhang, Xueliang
INTERSPEECH 2023, 2023, : 1060 - 1064
[20] JOINT LEARNING WITH SHARED LATENT SPACE FOR SELF-SUPERVISED MONAURAL SPEECH ENHANCEMENT
Li, Yi
Sun, Yang
Wang, Wenwu
Naqvi, Syed Mohsen
2023 SENSOR SIGNAL PROCESSING FOR DEFENCE CONFERENCE, SSPD, 2023, : 21 - 25

← 1 2 3 →