Speech Analysis/Synthesis by Gaussian Mixture Approximation of the Speech Spectrum for Voice Conversion

被引:0
作者
Amini, Jamal [1 ]
Shahrebabaki, Abdoreza Sabzi [1 ]
Shokouhi, Navid [1 ]
Sheikhzadeh, Hamid [1 ]
Raahemifa, Kaamran [2 ]
Eslami, Mehdi [1 ]
机构
[1] Amirkabir Univ Technol, Dept Elect Engn, Tehran, Iran
[2] Ryerson Univ, Dept Elect & Comp Engn, Toronto, ON M5B 2K3, Canada
来源
2013 IEEE INTERNATIONAL SYMPOSIUM ON SIGNAL PROCESSING AND INFORMATION TECHNOLOGY (IEEE ISSPIT 2013) | 2013年
关键词
Analysis/Synthesis; Feature Extraction; Voice Conversion; GMM; STRAIGHT; FREQUENCY;
D O I
暂无
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Voice conversion typically employs spectral features to convert a source voice to a target voice. In this paper, we propose a simple method of fitting the STRAIGHT spectrum with Gaussian mixture (GM) models for speech analysis/synthesis and spectral modification. The mean values of the Gaussians are pre-determined based on Mel-frequency spacing. The standard deviations are also adaptively adjusted using the constant-Q principle and the spectrum amplitudes. Finally, the weights of the Gaussians are determined by sampling the log-spectrum at Mel-frequencies. The proposed analysis/synthesis method (MFLS-GM) is employed for speech analysis/synthesis and voice conversion. Subjective evaluations employing MOS and ABX demonstrate superior performance of the voice conversion using the MFLS-GM compared to systems employing MFCC features. The computation cost of the proposed analysis/synthesis method is also much lower than those based on MFCC.
引用
收藏
页码:428 / 433
页数:6
相关论文
共 50 条
[41]   Voice conversion using structured Gaussian mixture model in cepstrum eigenspace [J].
LI Yangchun ;
YU Yibiao .
Chinese Journal of Acoustics, 2015, 34 (03) :325-336
[42]   Voice conversion using Viterbi algorithm based on Gaussian mixture model [J].
Jian Zhi-Hua ;
Yang Zhen .
2007 INTERNATIONAL SYMPOSIUM ON INTELLIGENT SIGNAL PROCESSING AND COMMUNICATION SYSTEMS, VOLS 1 AND 2, 2007, :40-43
[43]   Robustness of Speech Spoofing Detectors Against Adversarial Post-Processing of Voice Conversion [J].
Ding, Yi-Yang ;
Lin, Hao-Jian ;
Liu, Li-Juan ;
Ling, Zhen-Hua ;
Hu, Yu .
IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2021, 29 :3415-3426
[44]   Voice Source Waveform Analysis and Synthesis using Principal Component Analysis and Gaussian Mixture Modelling [J].
Gudnason, Jon ;
Thomas, Mark R. P. ;
Naylor, Patrick A. ;
Ellis, Dan P. W. .
INTERSPEECH 2009: 10TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2009, VOLS 1-5, 2009, :120-+
[45]   STATISTICAL APPROACH TO ENHANCING ESOPHAGEAL SPEECH BASED ON GAUSSIAN MIXTURE MODELS [J].
Doi, Hironori ;
Nakamura, Keigo ;
Toda, Tomoki ;
Saruwatari, Hiroshi ;
Shikano, Kiyohiro .
2010 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2010, :4250-4253
[46]   Speech Emotion Classification via a Modified Gaussian Mixture Model Approach [J].
Hosseini, Zeinab ;
Ahadi, Seyed Mohammad ;
Faraji, Neda .
2014 7TH INTERNATIONAL SYMPOSIUM ON TELECOMMUNICATIONS (IST), 2014, :487-491
[47]   MANDARIN ELECTROLARYNGEAL SPEECH VOICE CONVERSION WITH SEQUENCE-TO-SEQUENCE MODELING [J].
Yen, Ming-Chi ;
Huang, Wen-Chin ;
Kobayashi, Kazuhiro ;
Peng, Yu-Huai ;
Tsai, Shu-Wei ;
Tsao, Yu ;
Toda, Tomoki ;
Jang, Jyh-Shing Roger ;
Wang, Hsin-Min .
2021 IEEE AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING WORKSHOP (ASRU), 2021, :650-657
[48]   AN EVALUATION OF ALARYNGEAL SPEECH ENHANCEMENT METHODS BASED ON VOICE CONVERSION TECHNIQUES [J].
Doi, Hironori ;
Nakamura, Keigo ;
Toda, Tomoki ;
Saruwatari, Hiroshi ;
Shikano, Kiyohiro .
2011 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2011, :5136-5139
[49]   A Dual Alignment Scheme for Improved Speech-to-Singing Voice Conversion [J].
Vijayan, Karthika ;
Dong, Minghui ;
Li, Haizhou .
2017 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC 2017), 2017, :1598-1606
[50]   WaveVC: Speech and Fundamental Frequency Consistent Raw Audio Voice Conversion [J].
Ko, Kyungdeuk ;
Kim, Donghyeon ;
Oh, Kyungseok ;
Ko, Hanseok .
NEURAL PROCESSING LETTERS, 2024, 56 (04)