Reverberation Modeling for Source-Filter-based Neural Vocoder

被引:2
作者
Ai, Yang [1 ]
Wang, Xin [2 ]
Yamagishi, Junichi [2 ,3 ]
Ling, Zhen-Hua [1 ]
机构
[1] Univ Sci & Technol China, NELSLIP, Hefei, Peoples R China
[2] Natl Inst Informat, Tokyo, Japan
[3] Univ Edinburgh, CSTR, Edinburgh, Midlothian, Scotland
来源
INTERSPEECH 2020 | 2020年
基金
中国国家自然科学基金;
关键词
reverberation; room impulse response; source-filter-based model; neural vocoder; SPEECH; GENERATION;
D O I
10.21437/Interspeech.2020-1613
中图分类号
R36 [病理学]; R76 [耳鼻咽喉科学];
学科分类号
100104 ; 100213 ;
摘要
This paper presents a reverberation module for source-filter-based neural vocoders that improves the performance of reverberant effect modeling. This module uses the output waveform of neural vocoders as an input and produces a reverberant waveform by convolving the input with a room impulse response (RIR). We propose two approaches to parameterizing and estimating the RIR. The first approach assumes a global time-invariant (GTI) RIR and directly learns the values of the RIR on a training dataset. The second approach assumes an utterance-level time-variant (UTV) RIR, which is invariant within one utterance but varies across utterances, and uses another neural network to predict the RIR values. We add the proposed reverberation module to the phase spectrum predictor (PSP) of a HiNet vocoder and jointly train the model. Experimental results demonstrate that the proposed module was helpful for modeling the reverberation effect and improving the perceived quality of generated reverberant speech. The UTV-RIR was shown to be more robust than the GTI-RIR to unknown reverberation conditions and achieved a perceptually better reverberation effect.
引用
收藏
页码:3560 / 3564
页数:5
相关论文
共 34 条
[1]  
Aaron~ van den Oord Yazhe Li, 2018, PR MACH LEARN RES, P3918
[2]  
Adiga N, 2018, 2018 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), P5674, DOI 10.1109/ICASSP.2018.8462393
[3]  
Ai Y., 2020, ARXIV200407832
[4]   A Neural Vocoder With Hierarchical Generation of Amplitude and Phase Spectra for Statistical Parametric Speech Synthesis [J].
Ai, Yang ;
Ling, Zhen-Hua .
IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2020, 28 :839-851
[5]  
Ai Y, 2019, INT CONF ACOUST SPEE, P7025, DOI [10.1109/ICASSP.2019.8683016, 10.1109/icassp.2019.8683016]
[6]  
Ai Y, 2018, 2018 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), P5659, DOI 10.1109/ICASSP.2018.8461878
[7]  
[Anonymous], 2008, Springer Handbook of Speech Processing, DOI [DOI 10.1007/978-3-540-49127-9, 10.1007/978-3-540-49127-9_40]
[8]  
Bengio, 2017, P 5 INT C LEARN REPR
[9]  
Buchholz S, 2011, 12TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2011 (INTERSPEECH 2011), VOLS 1-5, P3060
[10]  
Cui Y, 2018, INTERSPEECH, P2017