Wavelet Analysis of Speaker Dependent and Independent Prosody for Voice Conversion

被引:0
作者
Sisman, Berrak [1 ]
Li, Haizhou [1 ]
机构
[1] Natl Univ Singapore, Singapore, Singapore
来源
19TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2018), VOLS 1-6: SPEECH RESEARCH FOR EMERGING MARKETS IN MULTILINGUAL SOCIETIES | 2018年
关键词
Wavelet transform; prosody analysis; voice conversion;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Thus far, voice conversion studies are mainly focused on the conversion of spectrum. However, speaker identity is also characterized by its prosody features, such as fundamental frequency (F0) and energy contour. We believe that with a better understanding of speaker dependent/independent prosody features, we can devise an analytic approach that addresses voice conversion in a better way. We consider that speaker dependent features reflect speaker's individuality, while speaker independent features reflect the expression of linguistic content. Therefore, the former is to be converted while the latter is to be carried over from source to target during the conversion. To achieve this, we provide an analysis of speaker dependent and speaker independent prosody patterns in different temporal scales by using wavelet transform. The centrepiece of this paper is based on the understanding that a speech utterance can be characterized by speaker dependent and independent features in its prosodic manifestations. Experiments show that the proposed prosody analysis scheme improves the prosody conversion performance consistently under the sparse representation framework.
引用
收藏
页码:52 / 56
页数:5
相关论文
共 36 条
  • [11] Gupta C., 2017, APSIPA ASC
  • [12] Hsu C.-C., 2016, IEEE INT C AC SPEECH
  • [13] Hsu W.-N., 2017, UNSUPERVISED LEARNIN
  • [14] Kaneko T., 2017, PARALLEL DATA FREE V
  • [15] Ladd D. R, 2008, INTONATIONAL PHONOLO, P153
  • [16] Emotional voice conversion using neural networks with arbitrary scales F0 based on wavelet transform
    Luo, Zhaojie
    Chen, Jinhui
    Takiguchi, Tetsuya
    Ariki, Yasuo
    [J]. EURASIP JOURNAL ON AUDIO SPEECH AND MUSIC PROCESSING, 2017,
  • [17] Deep Bidirectional LSTM Modeling of Timbre and Prosody for Emotional Voice Conversion
    Ming, Huaiping
    Huang, Dongyan
    Xie, Lei
    Wu, Jie
    Dong, Minghui
    Li, Haizhou
    [J]. 17TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2016), VOLS 1-5: UNDERSTANDING SPEECH PROCESSING IN HUMANS AND MACHINES, 2016, : 2453 - 2457
  • [18] Ming HP, 2016, INT CONF ACOUST SPEE, P5175, DOI 10.1109/ICASSP.2016.7472664
  • [19] Ming HP, 2015, INT CONF AFFECT, P804, DOI 10.1109/ACII.2015.7344665
  • [20] Mohammadi S. H., 2014, IEEE SPOK LANG TECHN