STATE MAPPING FOR CROSS-LANGUAGE SPEAKER ADAPTATION IN TTS

被引:12
|
作者
Chen, Yi-Ning [1 ]
Jiao, Yang [1 ]
Qian, Yao [1 ]
Soong, Frank K. [1 ]
机构
[1] Microsoft Res Asia, Beijing, Peoples R China
来源
2009 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS 1- 8, PROCEEDINGS | 2009年
关键词
HMM-based TTS; Speaker adaptation; Cross language; Kullback-Leibler divergence;
D O I
10.1109/ICASSP.2009.4960573
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Cross-language speaker adaptation has many interesting applications, e.g. speech-to-speech translation. However, in cross-language speaker adaptation, a common phoneme set, assumed to be used by different speakers of the same language, does not exist any longer. Instead, a nearest neighbor based phoneme mapping from one language to the other has been adopted. In this study, we used our recently proposed sub-phonemic HMM state mapping for cross-language adaptations. The sub-phonemic HMM states, due to their phonetic segment nature, tend to be more sharable across different languages than phonemes. Kullback-Leibler divergence, an information-theoretic measure, is chosen here to measure the similarity between given states in different languages. Experimental results show that new state mapping outperforms the phoneme mapping baseline system in terms of three objective measures: log spectral distance, F0 adaptation error and F0 correlations. In comparing with intra-language adaptation, the cross-language result of the new algorithm is also fairly decent.
引用
收藏
页码:4273 / 4276
页数:4
相关论文
共 50 条
  • [21] Cross-Language Voice Conversion Based on Eigenvoices
    Charlier, Malorie
    Ohtani, Yamato
    Toda, Tomoki
    Moinet, Alexis
    Dutoit, Thierry
    INTERSPEECH 2009: 10TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2009, VOLS 1-5, 2009, : 1603 - +
  • [22] Research on Cross-language Text Similarity Calculation
    Yuan, Sun
    Qian, Zhao
    PROCEEDINGS OF 2015 IEEE 5TH INTERNATIONAL CONFERENCE ON ELECTRONICS INFORMATION AND EMERGENCY COMMUNICATION, 2015, : 423 - 426
  • [23] Cross-language Wikipedia Editing of Okinawa, Japan
    Hale, Scott A.
    CHI 2015: PROCEEDINGS OF THE 33RD ANNUAL CHI CONFERENCE ON HUMAN FACTORS IN COMPUTING SYSTEMS, 2015, : 183 - 192
  • [24] One-Shot Speaker Adaptation Based on Initialization by Generative Adversarial Networks for TTS
    Lee, Jaeuk
    Chang, Joon-Hyuk
    INTERSPEECH 2022, 2022, : 2978 - 2982
  • [25] MODULE COMPARISON OF TRANSFORMER-TTS FOR SPEAKER ADAPTATION BASED ON FINE-TUNING
    Inoue, Katsuki
    Hara, Sunao
    Abe, Masanobu
    2020 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC), 2020, : 826 - 830
  • [26] Research on Phonological Processing in Cross-language Switching
    Lu, Sa
    Wang, Kun
    Fan, Yangying
    Tang, Xiaoyu
    Wu, Jinglong
    2017 IEEE INTERNATIONAL CONFERENCE ON MECHATRONICS AND AUTOMATION (ICMA), 2017, : 177 - 182
  • [27] Cross-Language Differential Testing of JSON']JSON
    Moeller, Jonas
    Weissberg, Felix
    Pirch, Lukas
    Eisenhofer, Thorsten
    Rieck, Konrad
    PROCEEDINGS OF THE 19TH ACM ASIA CONFERENCE ON COMPUTER AND COMMUNICATIONS SECURITY, ACM ASIACCS 2024, 2024, : 769 - 779
  • [28] Speaker Adaptation using Relevance Vector Regression for HMM-based Expressive TTS
    Hong, Doo Hwa
    Lee, Joun Yeop
    Jang, Se Young
    Kim, Nam Soo
    16TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2015), VOLS 1-5, 2015, : 1216 - 1220
  • [29] Cross-Language Neural Dialog State Tracker for Large Ontologies Using Hierarchical Attention
    Jang, Youngsoo
    Ham, Jiyeon
    Lee, Byung-Jun
    Kim, Kee-Eung
    IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2018, 26 (11) : 2072 - 2082
  • [30] A Phonetic Assessment of Cross-Language Voice Conversion
    Yanagisawa, Kayoko
    Huckvale, Mark
    INTERSPEECH 2008: 9TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2008, VOLS 1-5, 2008, : 593 - 596