HMM-Based Voice Conversion Using Quantized F0 Context

被引:8
作者
Nose, Takashi [1 ]
Ota, Yuhei [1 ]
Kobayashi, Takao [1 ]
机构
[1] Tokyo Inst Technol, Interdisciplinary Grad Sch Sci & Engn, Yokohama, Kanagawa 2268502, Japan
关键词
voice conversion; HMM-based speech synthesis; F0; quantization; prosodic context; nonparallel data;
D O I
10.1587/transinf.E93.D.2483
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
We propose a segment-based voice conversion technique using hidden Markov model (HMM)-based speech synthesis with nonparallel training data. In the proposed technique, the phoneme information with durations and a quantized F0 contour are extracted from the input speech of a source speaker, and are transmitted to a synthesis part. In the synthesis part, the quantized F0 symbols are used as prosodic context. A phonetically and prosodically context-dependent label sequence is generated from the transmitted phoneme and the F0 symbols. Then, converted speech is generated from the label sequence with durations using the target speaker's pre-trained context-dependent HMMs. In the model training, the models of the source and target speakers can be trained separately, hence there is no need to prepare parallel speech data of the source and target speakers. Objective and subjective experimental results show that the segment-based voice conversion with phonetic and prosodic contexts works effectively even if the parallel speech data is not available.
引用
收藏
页码:2483 / 2490
页数:8
相关论文
共 50 条
[21]   F0 Transformation within the Voice Conversion Framework [J].
Hanzlicek, Zdenek ;
Matousek, Jindrich .
INTERSPEECH 2007: 8TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION, VOLS 1-4, 2007, :681-684
[22]   HMM-BASED SEQUENCE-TO-FRAME MAPPING FOR VOICE CONVERSION [J].
Qiao, Yu ;
Saito, Daisuke ;
Minematsu, Nobuaki .
2010 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2010, :4830-4833
[23]   A STYLE CAPTURING APPROACH TO F0 TRANSFORMATION IN VOICE CONVERSION [J].
Anumanchipalli, Gopala Krishna ;
Oliveira, Luis C. ;
Black, Alan W. .
2013 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2013, :6915-6919
[24]   A New Method for F0 Tracking Errors Fix and Generation in HMM-based Mandarin Speech Synthesis using Generation Process Model [J].
Wang, Miaomiao ;
Wen, Miaomiao ;
Hirose, Keikichi ;
Minematsu, Nobuaki .
2010 IEEE 10TH INTERNATIONAL CONFERENCE ON SIGNAL PROCESSING PROCEEDINGS (ICSP2010), VOLS I-III, 2010, :609-612
[25]   Hear Your Face: Face-based voice conversion with F0 estimation [J].
Lee, Jaejun ;
Oh, Yoori ;
Hwang, Injune ;
Lee, Kyogu .
INTERSPEECH 2024, 2024, :4378-4382
[26]   Reducing over-smoothness in HMM-based speech synthesis using exemplar-based voice conversion [J].
Gia-Nhu Nguyen ;
Trung-Nghia Phung .
EURASIP Journal on Audio, Speech, and Music Processing, 2017
[27]   Reducing over-smoothness in HMM-based speech synthesis using exemplar-based voice conversion [J].
Gia-Nhu Nguyen ;
Trung-Nghia Phung .
EURASIP JOURNAL ON AUDIO SPEECH AND MUSIC PROCESSING, 2017,
[28]   Investigation of F0 conditioning and Fully Convolutional Networks in Variational Autoencoder based Voice Conversion [J].
Huang, Wen-Chin ;
Wu, Yi-Chiao ;
Lo, Chen-Chou ;
Tobing, Patrick Lumban ;
Hayashi, Tomoki ;
Kobayashi, Kazuhiro ;
Toda, Tomoki ;
Tsao, Yu ;
Wang, Hsin-Min .
INTERSPEECH 2019, 2019, :709-713
[29]   HMM-based Speaker Characteristics Emphasis Using Average Voice Model [J].
Nose, Takashi ;
Adada, Junichi ;
Kobayashi, Takao .
INTERSPEECH 2009: 10TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2009, VOLS 1-5, 2009, :2599-2602
[30]   A New HMM-Based Voice Conversion Methodology Evaluated on Monolingual and Cross-Lingual Conversion Tasks [J].
Percybrooks, Winston S. ;
Moore, Elliot .
IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2015, 23 (12) :2298-2310