Speech Analysis/Synthesis by Gaussian Mixture Approximation of the Speech Spectrum for Voice Conversion

被引：0

作者：

Amini, Jamal ^{[1
]}

Shahrebabaki, Abdoreza Sabzi ^{[1
]}

Shokouhi, Navid ^{[1
]}

Sheikhzadeh, Hamid ^{[1
]}

Raahemifa, Kaamran ^{[2
]}

Eslami, Mehdi ^{[1
]}

机构：

[1] Amirkabir Univ Technol, Dept Elect Engn, Tehran, Iran

[2] Ryerson Univ, Dept Elect & Comp Engn, Toronto, ON M5B 2K3, Canada

来源：

2013 IEEE INTERNATIONAL SYMPOSIUM ON SIGNAL PROCESSING AND INFORMATION TECHNOLOGY (IEEE ISSPIT 2013) | 2013年

关键词：

Analysis/Synthesis; Feature Extraction; Voice Conversion; GMM; STRAIGHT; FREQUENCY;

D O I：

暂无

中图分类号：

TP3 [计算技术、计算机技术];

学科分类号：

0812 ;

摘要：

Voice conversion typically employs spectral features to convert a source voice to a target voice. In this paper, we propose a simple method of fitting the STRAIGHT spectrum with Gaussian mixture (GM) models for speech analysis/synthesis and spectral modification. The mean values of the Gaussians are pre-determined based on Mel-frequency spacing. The standard deviations are also adaptively adjusted using the constant-Q principle and the spectrum amplitudes. Finally, the weights of the Gaussians are determined by sampling the log-spectrum at Mel-frequencies. The proposed analysis/synthesis method (MFLS-GM) is employed for speech analysis/synthesis and voice conversion. Subjective evaluations employing MOS and ABX demonstrate superior performance of the voice conversion using the MFLS-GM compared to systems employing MFCC features. The computation cost of the proposed analysis/synthesis method is also much lower than those based on MFCC.

引用

页码：428 / 433

页数：6

共 50 条

[31] A Comparative Study of Self-Supervised Speech Representation Based Voice Conversion [J].

Huang, Wen-Chin ;

Yang, Shu-Wen ;

Hayashi, Tomoki ;

Toda, Tomoki .

IEEE JOURNAL OF SELECTED TOPICS IN SIGNAL PROCESSING, 2022, 16 (06) :1308-1318

[32] The Research of Speech Emotion Recognition Based on Gaussian Mixture Model [J].

Zhang, Wanli ;

Li, Guoxin ;

Gao, Wei .

MECHANICAL COMPONENTS AND CONTROL ENGINEERING III, 2014, 668-669 :1126-+

[33] A COMPARISON OF DISCRETE AND SOFT SPEECH UNITS FOR IMPROVED VOICE CONVERSION [J].

van Niekerk, Benjamin ;

Carbonneau, Marc-Andre ;

Zaidi, Julian ;

Baas, Matthew ;

Seute, Hugo ;

Kamper, Herman .

2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, :6562-6566

[34] Improvement of time alignment of the speech signals to be used in voice conversion [J].

Mozaffari, Fatemeh ;

Sayadian, Abolghasem .

INTERNATIONAL JOURNAL OF SPEECH TECHNOLOGY, 2018, 21 (01) :79-84

[35] TEXT-INFORMED SPEECH INPAINTING VIA VOICE CONVERSION [J].

Prablanc, Pierre ;

Ozerov, Alexey ;

Duong, Ngoc Q. K. ;

Perez, Patrick .

2016 24TH EUROPEAN SIGNAL PROCESSING CONFERENCE (EUSIPCO), 2016, :878-882

[36] Audio-Visual Mandarin Electrolaryngeal Speech Voice Conversion [J].

Chien, Yung-Lun ;

Chen, Hsin-Hao ;

Yen, Ming-Chi ;

Tsai, Shu-Wei ;

Wang, Hsin-Min ;

Tsao, Yu ;

Chi, Tai-Shih .

INTERSPEECH 2023, 2023, :5023-5026

[37] Improving Body Transmitted Unvoiced Speech with Statistical Voice Conversion [J].

Nakagiri, Mikihiro ;

Toda, Tomoki ;

Kashioka, Hideki ;

Shikano, Kiyohiro .

INTERSPEECH 2006 AND 9TH INTERNATIONAL CONFERENCE ON SPOKEN LANGUAGE PROCESSING, VOLS 1-5, 2006, :2270-2273

[38] EfficientTTS 2: Variational End-to-End Text-to-Speech Synthesis and Voice Conversion [J].

Miao, Chenfeng ;

Zhu, Qingying ;

Chen, Minchuan ;

Ma, Jun ;

Wang, Shaojun ;

Xiao, Jing .

IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2024, 32 :1650-1661

[39] Multi-MelGAN Voice Conversion for the Creation of Under-Resourced Child Speech Synthesis [J].

Govender, Avashna ;

Paul, Dipjyoti .

2022 IST-AFRICA CONFERENCE, 2022,

[40] Voice conversion using structured Gaussian mixture model in cepstrum eigenspace [J].

LI Yangchun ;

YU Yibiao .

ChineseJournalofAcoustics, 2015, 34 (03) :325-336

← 1 2 3 4 5 →