Speech Analysis/Synthesis by Gaussian Mixture Approximation of the Speech Spectrum for Voice Conversion

被引:0
作者
Amini, Jamal [1 ]
Shahrebabaki, Abdoreza Sabzi [1 ]
Shokouhi, Navid [1 ]
Sheikhzadeh, Hamid [1 ]
Raahemifa, Kaamran [2 ]
Eslami, Mehdi [1 ]
机构
[1] Amirkabir Univ Technol, Dept Elect Engn, Tehran, Iran
[2] Ryerson Univ, Dept Elect & Comp Engn, Toronto, ON M5B 2K3, Canada
来源
2013 IEEE INTERNATIONAL SYMPOSIUM ON SIGNAL PROCESSING AND INFORMATION TECHNOLOGY (IEEE ISSPIT 2013) | 2013年
关键词
Analysis/Synthesis; Feature Extraction; Voice Conversion; GMM; STRAIGHT; FREQUENCY;
D O I
暂无
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Voice conversion typically employs spectral features to convert a source voice to a target voice. In this paper, we propose a simple method of fitting the STRAIGHT spectrum with Gaussian mixture (GM) models for speech analysis/synthesis and spectral modification. The mean values of the Gaussians are pre-determined based on Mel-frequency spacing. The standard deviations are also adaptively adjusted using the constant-Q principle and the spectrum amplitudes. Finally, the weights of the Gaussians are determined by sampling the log-spectrum at Mel-frequencies. The proposed analysis/synthesis method (MFLS-GM) is employed for speech analysis/synthesis and voice conversion. Subjective evaluations employing MOS and ABX demonstrate superior performance of the voice conversion using the MFLS-GM compared to systems employing MFCC features. The computation cost of the proposed analysis/synthesis method is also much lower than those based on MFCC.
引用
收藏
页码:428 / 433
页数:6
相关论文
共 50 条
[31]   A Comparative Study of Self-Supervised Speech Representation Based Voice Conversion [J].
Huang, Wen-Chin ;
Yang, Shu-Wen ;
Hayashi, Tomoki ;
Toda, Tomoki .
IEEE JOURNAL OF SELECTED TOPICS IN SIGNAL PROCESSING, 2022, 16 (06) :1308-1318
[32]   The Research of Speech Emotion Recognition Based on Gaussian Mixture Model [J].
Zhang, Wanli ;
Li, Guoxin ;
Gao, Wei .
MECHANICAL COMPONENTS AND CONTROL ENGINEERING III, 2014, 668-669 :1126-+
[33]   A COMPARISON OF DISCRETE AND SOFT SPEECH UNITS FOR IMPROVED VOICE CONVERSION [J].
van Niekerk, Benjamin ;
Carbonneau, Marc-Andre ;
Zaidi, Julian ;
Baas, Matthew ;
Seute, Hugo ;
Kamper, Herman .
2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, :6562-6566
[34]   Improvement of time alignment of the speech signals to be used in voice conversion [J].
Mozaffari, Fatemeh ;
Sayadian, Abolghasem .
INTERNATIONAL JOURNAL OF SPEECH TECHNOLOGY, 2018, 21 (01) :79-84
[35]   TEXT-INFORMED SPEECH INPAINTING VIA VOICE CONVERSION [J].
Prablanc, Pierre ;
Ozerov, Alexey ;
Duong, Ngoc Q. K. ;
Perez, Patrick .
2016 24TH EUROPEAN SIGNAL PROCESSING CONFERENCE (EUSIPCO), 2016, :878-882
[36]   Audio-Visual Mandarin Electrolaryngeal Speech Voice Conversion [J].
Chien, Yung-Lun ;
Chen, Hsin-Hao ;
Yen, Ming-Chi ;
Tsai, Shu-Wei ;
Wang, Hsin-Min ;
Tsao, Yu ;
Chi, Tai-Shih .
INTERSPEECH 2023, 2023, :5023-5026
[37]   Improving Body Transmitted Unvoiced Speech with Statistical Voice Conversion [J].
Nakagiri, Mikihiro ;
Toda, Tomoki ;
Kashioka, Hideki ;
Shikano, Kiyohiro .
INTERSPEECH 2006 AND 9TH INTERNATIONAL CONFERENCE ON SPOKEN LANGUAGE PROCESSING, VOLS 1-5, 2006, :2270-2273
[38]   EfficientTTS 2: Variational End-to-End Text-to-Speech Synthesis and Voice Conversion [J].
Miao, Chenfeng ;
Zhu, Qingying ;
Chen, Minchuan ;
Ma, Jun ;
Wang, Shaojun ;
Xiao, Jing .
IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2024, 32 :1650-1661
[39]   Multi-MelGAN Voice Conversion for the Creation of Under-Resourced Child Speech Synthesis [J].
Govender, Avashna ;
Paul, Dipjyoti .
2022 IST-AFRICA CONFERENCE, 2022,
[40]   Voice conversion using structured Gaussian mixture model in cepstrum eigenspace [J].
LI Yangchun ;
YU Yibiao .
ChineseJournalofAcoustics, 2015, 34 (03) :325-336