HMM-Based Voice Conversion Using Quantized F0 Context

被引:8
|
作者
Nose, Takashi [1 ]
Ota, Yuhei [1 ]
Kobayashi, Takao [1 ]
机构
[1] Tokyo Inst Technol, Interdisciplinary Grad Sch Sci & Engn, Yokohama, Kanagawa 2268502, Japan
来源
IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS | 2010年 / E93D卷 / 09期
关键词
voice conversion; HMM-based speech synthesis; F0; quantization; prosodic context; nonparallel data;
D O I
10.1587/transinf.E93.D.2483
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
We propose a segment-based voice conversion technique using hidden Markov model (HMM)-based speech synthesis with nonparallel training data. In the proposed technique, the phoneme information with durations and a quantized F0 contour are extracted from the input speech of a source speaker, and are transmitted to a synthesis part. In the synthesis part, the quantized F0 symbols are used as prosodic context. A phonetically and prosodically context-dependent label sequence is generated from the transmitted phoneme and the F0 symbols. Then, converted speech is generated from the label sequence with durations using the target speaker's pre-trained context-dependent HMMs. In the model training, the models of the source and target speakers can be trained separately, hence there is no need to prepare parallel speech data of the source and target speakers. Objective and subjective experimental results show that the segment-based voice conversion with phonetic and prosodic contexts works effectively even if the parallel speech data is not available.
引用
收藏
页码:2483 / 2490
页数:8
相关论文
共 50 条
  • [1] Soft context clustering for F0 modeling in HMM-based speech synthesis
    Khorram, Soheil
    Sameti, Hossein
    King, Simon
    EURASIP JOURNAL ON ADVANCES IN SIGNAL PROCESSING, 2015,
  • [2] Soft context clustering for F0 modeling in HMM-based speech synthesis
    Soheil Khorram
    Hossein Sameti
    Simon King
    EURASIP Journal on Advances in Signal Processing, 2015
  • [3] HMM-BASED SPEECH SYNTHESIS WITH UNSUPERVISED LABELING OF ACCENTUAL CONTEXT BASED ON F0 QUANTIZATION AND AVERAGE VOICE MODEL
    Nose, Takashi
    Ooki, Koujirou
    Kobayashi, Takao
    2010 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2010, : 4622 - 4625
  • [4] Superpositional HMM-based intonation synthesis using a functional F0 model
    Ni, Jinfu
    Shiga, Yoshinori
    Hori, Chiori
    2014 9TH INTERNATIONAL SYMPOSIUM ON CHINESE SPOKEN LANGUAGE PROCESSING (ISCSLP), 2014, : 270 - 274
  • [5] Superpositional HMM-Based Intonation Synthesis Using a Functional F0 Model
    Ni, Jinfu
    Shiga, Yoshinori
    Hori, Chiori
    JOURNAL OF SIGNAL PROCESSING SYSTEMS FOR SIGNAL IMAGE AND VIDEO TECHNOLOGY, 2016, 82 (02): : 273 - 286
  • [6] Superpositional HMM-Based Intonation Synthesis Using a Functional F0 Model
    Jinfu Ni
    Yoshinori Shiga
    Chiori Hori
    Journal of Signal Processing Systems, 2016, 82 : 273 - 286
  • [7] Speaker-independent HMM-based Voice Conversion Using Quantized Fundamental Frequency
    Nose, Takashi
    Kobayashi, Takao
    11TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2010 (INTERSPEECH 2010), VOLS 3 AND 4, 2010, : 1724 - 1727
  • [8] Quantized F0 Context and Its Applications to Speech Synthesis, Speech Coding and Voice Conversion
    Nose, Takashi
    Kobayashi, Takao
    2014 TENTH INTERNATIONAL CONFERENCE ON INTELLIGENT INFORMATION HIDING AND MULTIMEDIA SIGNAL PROCESSING (IIH-MSP 2014), 2014, : 578 - 581
  • [9] Investigation of Prosodic F0 Layers in Hierarchical F0 Modeling for HMM-based Speech Synthesis
    Lei, Ming
    Wu, Yi-Jian
    Ling, Zhen-Hua
    Dai, Li-Rong
    2010 IEEE 10TH INTERNATIONAL CONFERENCE ON SIGNAL PROCESSING PROCEEDINGS (ICSP2010), VOLS I-III, 2010, : 613 - +
  • [10] Context-dependent additive log F0 model for HMM-based speech synthesis
    Zen, Heiga
    Braunschweiler, Norbert
    INTERSPEECH 2009: 10TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2009, VOLS 1-5, 2009, : 2039 - 2042