Speech Analysis/Synthesis by Gaussian Mixture Approximation of the Speech Spectrum for Voice Conversion

被引:0
作者
Amini, Jamal [1 ]
Shahrebabaki, Abdoreza Sabzi [1 ]
Shokouhi, Navid [1 ]
Sheikhzadeh, Hamid [1 ]
Raahemifa, Kaamran [2 ]
Eslami, Mehdi [1 ]
机构
[1] Amirkabir Univ Technol, Dept Elect Engn, Tehran, Iran
[2] Ryerson Univ, Dept Elect & Comp Engn, Toronto, ON M5B 2K3, Canada
来源
2013 IEEE INTERNATIONAL SYMPOSIUM ON SIGNAL PROCESSING AND INFORMATION TECHNOLOGY (IEEE ISSPIT 2013) | 2013年
关键词
Analysis/Synthesis; Feature Extraction; Voice Conversion; GMM; STRAIGHT; FREQUENCY;
D O I
暂无
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Voice conversion typically employs spectral features to convert a source voice to a target voice. In this paper, we propose a simple method of fitting the STRAIGHT spectrum with Gaussian mixture (GM) models for speech analysis/synthesis and spectral modification. The mean values of the Gaussians are pre-determined based on Mel-frequency spacing. The standard deviations are also adaptively adjusted using the constant-Q principle and the spectrum amplitudes. Finally, the weights of the Gaussians are determined by sampling the log-spectrum at Mel-frequencies. The proposed analysis/synthesis method (MFLS-GM) is employed for speech analysis/synthesis and voice conversion. Subjective evaluations employing MOS and ABX demonstrate superior performance of the voice conversion using the MFLS-GM compared to systems employing MFCC features. The computation cost of the proposed analysis/synthesis method is also much lower than those based on MFCC.
引用
收藏
页码:428 / 433
页数:6
相关论文
共 50 条
[21]   Enrichment of Oesophageal Speech: Voice Conversion with Duration-Matched Synthetic Speech as Target [J].
Raman, Sneha ;
Sarasola, Xabier ;
Navas, Eva ;
Hernaez, Inma .
APPLIED SCIENCES-BASEL, 2021, 11 (13)
[22]   On robustness of speech based biometric systems against voice conversion attack [J].
Pal, Monisankha ;
Saha, Goutam .
APPLIED SOFT COMPUTING, 2015, 30 :214-228
[23]   VOICE CONVERSION FOR VARIOUS TYPES OF BODY TRANSMITTED SPEECH [J].
Toda, Tomoki ;
Nakamura, Keigo ;
Sekimoto, Hidehiko ;
Shikano, Kiyohiro .
2009 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS 1- 8, PROCEEDINGS, 2009, :3601-3604
[24]   Voice Conversion for Improving Perceived Likability of Uttered Speech [J].
Horiike, Shinya ;
Morise, Masanori .
IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2020, E103D (05) :1199-1202
[25]   Electrolaryngeal Speech Enhancement Based on Statistical Voice Conversion [J].
Nakamura, Keigo ;
Toda, Tomoki ;
Saruwatari, Hiroshi ;
Shikano, Kiyohiro .
INTERSPEECH 2009: 10TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2009, VOLS 1-5, 2009, :1443-1446
[26]   Runtime and Speech Quality Survey of a Voice Conversion Method [J].
Jokisch, Oliver ;
Birhanu, Yitagessu ;
Hoffmann, Ruediger .
2013 IEEE EUROCON, 2013, :1684-1688
[27]   A Study of Speech Phase in Dysarthria Voice Conversion System [J].
Chen, Ko-Chiang ;
Han, Ji-Yan ;
Jhang, Sin-Hua ;
Lai, Ying-Hui .
FUTURE TRENDS IN BIOMEDICAL AND HEALTH INFORMATICS AND CYBERSECURITY IN MEDICAL DEVICES, ICBHI 2019, 2020, 74 :219-226
[28]   ON USING BACKPROPAGATION FOR SPEECH TEXTURE GENERATION AND VOICE CONVERSION [J].
Chorowski, Jan ;
Weiss, Ron J. ;
Saurous, Rif A. ;
Bengio, Samy .
2018 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2018, :2256-2260
[29]   ASSEM-VC: REALISTIC VOICE CONVERSION BY ASSEMBLING MODERN SPEECH SYNTHESIS TECHNIQUES [J].
Kim, Kang-Wook ;
Park, Seung-Won ;
Lee, Junhyeok ;
Joe, Myun-Chul .
2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, :6997-7001
[30]   VOICE CONVERSION BASED ON MATRIX VARIATE GAUSSIAN MIXTURE MODEL [J].
Saito, Daisuke ;
Doi, Hidenobu ;
Minematsu, Nobuaki ;
Hirose, Keikichi .
2014 12TH INTERNATIONAL CONFERENCE ON SIGNAL PROCESSING (ICSP), 2014, :567-571