Speaker and Phoneme-Aware Speech Bandwidth Extension with Residual Dual-Path Network

被引:6
作者
Hou, Nana [1 ]
Xu, Chenglin [1 ,4 ]
Van Tung Pham [1 ]
Zhou, Joey Tianyi [3 ]
Chng, Eng Siong [1 ,2 ]
Li, Haizhou [4 ,5 ]
机构
[1] Nanyang Technol Univ, Sch Comp Sci & Engn, Singapore, Singapore
[2] Nanyang Technol Univ, Temasek Labs, Singapore, Singapore
[3] ASTAR, Inst High Performance Comp IHPC, Singapore, Singapore
[4] Natl Univ Singapore, Dept Elect & Comp Engn, Singapore, Singapore
[5] Univ Bremen, Machine Listening Lab, Bremen, Germany
来源
INTERSPEECH 2020 | 2020年
基金
新加坡国家研究基金会;
关键词
Speech bandwidth extension; Residual dual-path network; Speaker and phoneme knowledge; I-vector; Phonetic posteriorgram; WAVE;
D O I
10.21437/Interspeech.2020-1994
中图分类号
R36 [病理学]; R76 [耳鼻咽喉科学];
学科分类号
100104 ; 100213 ;
摘要
Speech bandwidth extension aims to generate a wideband signal from a narrowband (low-band) input by predicting the missing high-frequency components. It is believed that the general knowledge about the speaker and phonetic content strengthens the prediction. In this paper, we propose to augment the low-band acoustic features with i-vector and phonetic posteriorgram (PPG), which represent speaker and phonetic content of the speech, respectively. We also propose a residual dual-path network (RDPN) as the core module to process the augmented features, which fully utilizes the utterance-level temporal continuity information and avoids gradient vanishing. Experiments show that the proposed method achieves 20.2% and 7.0% relative improvements over the best baseline in terms of log-spectral distortion (LSD) and signal-to-noise ratio (SNR), respectively. Furthermore, our method is 16 times more compact than the best baseline in terms of the number of parameters.
引用
收藏
页码:4064 / 4068
页数:5
相关论文
共 36 条
  • [1] Abel J, 2018, 2018 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), P5469, DOI 10.1109/ICASSP.2018.8462362
  • [2] EFFECTIVENESS OF LINEAR PREDICTION CHARACTERISTICS OF SPEECH WAVE FOR AUTOMATIC SPEAKER IDENTIFICATION AND VERIFICATION
    ATAL, BS
    [J]. JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 1974, 55 (06) : 1304 - 1312
  • [3] Ba J. L., 2016, LAYER NORMALIZATION, Vabs/1607.06450
  • [4] Bachhav P, 2019, INT CONF ACOUST SPEE, P7010, DOI [10.1109/ICASSP.2019.8683611, 10.1109/icassp.2019.8683611]
  • [5] Eskimez SE, 2019, INT CONF ACOUST SPEE, P3717, DOI [10.1109/ICASSP.2019.8682215, 10.1109/icassp.2019.8682215]
  • [6] Speech Bandwidth Extension Using Bottleneck Features and Deep Recurrent Neural Networks
    Gu, Yu
    Ling, Zhen-Hua
    Dai, Li-Rong
    [J]. 17TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2016), VOLS 1-5: UNDERSTANDING SPEECH PROCESSING IN HUMANS AND MACHINES, 2016, : 297 - 301
  • [7] Gupta A, 2019, IEEE WORK APPL SIG, P205, DOI [10.1109/waspaa.2019.8937169, 10.1109/WASPAA.2019.8937169]
  • [8] Hao X, 2020, INT CONF ACOUST SPEE, P866, DOI [10.1109/icassp40776.2020.9054551, 10.1109/ICASSP40776.2020.9054551]
  • [9] Hou NN, 2019, ASIAPAC SIGN INFO PR, P667, DOI 10.1109/APSIPAASC47483.2019.9023218
  • [10] I. Rec, 2005, P. 862.2: Wideband extension to recommendation p. 862 for the assessment of wideband telephone networks and speech codecs