INTEGRATION OF SPEAKER AND PITCH ADAPTIVE TRAINING FOR HMM-BASED SINGING VOICE SYNTHESIS

被引:0
作者
Shirota, Kanako [1 ]
Nakamura, Kazuhiro [1 ]
Hashimoto, Kei [1 ]
Oura, Keiichiro [1 ]
Nankaku, Yoshihiko [1 ]
Tokuda, Keiichi [1 ]
机构
[1] Nagoya Inst Technol, Dept Sci & Engn Simulat, Nagoya, Aichi, Japan
来源
2014 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP) | 2014年
关键词
singing voice synthesis; hidden Markov model; speaker adaptive training; pitch adaptive training; SPEECH; ADAPTATION;
D O I
暂无
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
A statistical parametric approach to singing voice synthesis based on hidden Markov models (HMMs) has been growing in popularity over the last few years. The spectrum, excitation, vibrato, and duration of the singing voice in this approach are simultaneously modeled with context-dependent HMMs and waveforms are generated from the HMMs themselves. Since HMM-based singing voice synthesis systems are "corpus-based," the HMMs corresponding to contextual factors that rarely appear in the training data cannot be well-trained. However, it may be difficult to prepare a large enough quantity of singing voice data sung by one singer. Furthermore, the pitch included in each song is imbalanced, and there is the vocal range of the singer. In this paper, we propose "singer adaptive training" which can solve the data sparseness problem. Experimental results demonstrated that the proposed technique improved the quality of the synthesized singing voices.
引用
收藏
页数:5
相关论文
共 16 条
[1]  
[Anonymous], 1999, P EUROSPEECH
[2]  
[Anonymous], 2006, P INTERSPEECH
[3]   Maximum likelihood linear transformations for HMM-based speech recognition [J].
Gales, MJF .
COMPUTER SPEECH AND LANGUAGE, 1998, 12 (02) :75-98
[4]   Restructuring speech representations using a pitch-adaptive time-frequency smoothing and an instantaneous-frequency-based F0 extraction:: Possible role of a repetitive structure in sounds [J].
Kawahara, H ;
Masuda-Katsuse, I ;
de Cheveigné, A .
SPEECH COMMUNICATION, 1999, 27 (3-4) :187-207
[5]  
Oura K., 2010, Proceedings of the ISCA Workshop on Speech Synthesis, P211
[6]  
Oura K, 2012, INT CONF ACOUST SPEE, P5377, DOI 10.1109/ICASSP.2012.6289136
[7]  
Qian Y, 2008, INTERSPEECH 2008: 9TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2008, VOLS 1-5, P2126
[8]  
Shichiri K., 2002, P ICSLP, V1, P1269
[9]  
Shinoda K., 2000, J ACOUST SOC JPN E, V21, P76
[10]  
Tamura M, 2001, INT CONF ACOUST SPEE, P805, DOI 10.1109/ICASSP.2001.941037