HMM-Based Voice Conversion Using Quantized F0 Context

被引：8

作者：

Nose, Takashi ^{[1
]}

Ota, Yuhei ^{[1
]}

Kobayashi, Takao ^{[1
]}

机构：

[1] Tokyo Inst Technol, Interdisciplinary Grad Sch Sci & Engn, Yokohama, Kanagawa 2268502, Japan

来源：

IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS | 2010年 / E93D卷 / 09期

关键词：

voice conversion; HMM-based speech synthesis; F0; quantization; prosodic context; nonparallel data;

D O I：

10.1587/transinf.E93.D.2483

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

We propose a segment-based voice conversion technique using hidden Markov model (HMM)-based speech synthesis with nonparallel training data. In the proposed technique, the phoneme information with durations and a quantized F0 contour are extracted from the input speech of a source speaker, and are transmitted to a synthesis part. In the synthesis part, the quantized F0 symbols are used as prosodic context. A phonetically and prosodically context-dependent label sequence is generated from the transmitted phoneme and the F0 symbols. Then, converted speech is generated from the label sequence with durations using the target speaker's pre-trained context-dependent HMMs. In the model training, the models of the source and target speakers can be trained separately, hence there is no need to prepare parallel speech data of the source and target speakers. Objective and subjective experimental results show that the segment-based voice conversion with phonetic and prosodic contexts works effectively even if the parallel speech data is not available.

引用

页码：2483 / 2490

页数：8

共 50 条

[1] Soft context clustering for F0 modeling in HMM-based speech synthesis
Khorram, Soheil
Sameti, Hossein
King, Simon
EURASIP JOURNAL ON ADVANCES IN SIGNAL PROCESSING, 2015,
[2] Soft context clustering for F0 modeling in HMM-based speech synthesis
Soheil Khorram
Hossein Sameti
Simon King
EURASIP Journal on Advances in Signal Processing, 2015
[3] HMM-BASED SPEECH SYNTHESIS WITH UNSUPERVISED LABELING OF ACCENTUAL CONTEXT BASED ON F0 QUANTIZATION AND AVERAGE VOICE MODEL
Nose, Takashi
Ooki, Koujirou
Kobayashi, Takao
2010 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2010, : 4622 - 4625
[4] Superpositional HMM-based intonation synthesis using a functional F0 model
Ni, Jinfu
Shiga, Yoshinori
Hori, Chiori
2014 9TH INTERNATIONAL SYMPOSIUM ON CHINESE SPOKEN LANGUAGE PROCESSING (ISCSLP), 2014, : 270 - 274
[5] Superpositional HMM-Based Intonation Synthesis Using a Functional F0 Model
Ni, Jinfu
Shiga, Yoshinori
Hori, Chiori
JOURNAL OF SIGNAL PROCESSING SYSTEMS FOR SIGNAL IMAGE AND VIDEO TECHNOLOGY, 2016, 82 (02): : 273 - 286
[6] Superpositional HMM-Based Intonation Synthesis Using a Functional F0 Model
Jinfu Ni
Yoshinori Shiga
Chiori Hori
Journal of Signal Processing Systems, 2016, 82 : 273 - 286
[7] Speaker-independent HMM-based Voice Conversion Using Quantized Fundamental Frequency
Nose, Takashi
Kobayashi, Takao
11TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2010 (INTERSPEECH 2010), VOLS 3 AND 4, 2010, : 1724 - 1727
[8] Quantized F0 Context and Its Applications to Speech Synthesis, Speech Coding and Voice Conversion
Nose, Takashi
Kobayashi, Takao
2014 TENTH INTERNATIONAL CONFERENCE ON INTELLIGENT INFORMATION HIDING AND MULTIMEDIA SIGNAL PROCESSING (IIH-MSP 2014), 2014, : 578 - 581
[9] Investigation of Prosodic F0 Layers in Hierarchical F0 Modeling for HMM-based Speech Synthesis
Lei, Ming
Wu, Yi-Jian
Ling, Zhen-Hua
Dai, Li-Rong
2010 IEEE 10TH INTERNATIONAL CONFERENCE ON SIGNAL PROCESSING PROCEEDINGS (ICSP2010), VOLS I-III, 2010, : 613 - +
[10] Context-dependent additive log F0 model for HMM-based speech synthesis
Zen, Heiga
Braunschweiler, Norbert
INTERSPEECH 2009: 10TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2009, VOLS 1-5, 2009, : 2039 - 2042

← 1 2 3 4 5 →