The USTC System for Voice Conversion Challenge 2016: Neural Network Based Approaches for Spectrum, Aperiodicity and F0 Conversion

被引：4

作者：

Chen, Ling-Hui ^{[1
,2
]}

Liu, Li-Juan ^{[2
]}

Ling, Zhen-Hua ^{[1
]}

Jiang, Yuan ^{[2
]}

Dai, Li-Rong ^{[1
]}

机构：

[1] Univ Sci & Technol China, Natl Engn Lab Speech & Language Informat Proc, Hefei, Anhui, Peoples R China

[2] IFLYTEK Res, Hefei, Anhui, Peoples R China

来源：

17TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2016), VOLS 1-5: UNDERSTANDING SPEECH PROCESSING IN HUMANS AND MACHINES | 2016年

关键词：

voice conversion; frequency warping; DNN; RNN; LSTM;

D O I：

10.21437/Interspeech.2016-456

中图分类号：

O42 [声学];

学科分类号：

070206 ; 082403 ;

摘要：

This paper introduces the methods we adopt to build our system for the evaluation event of Voice Conversion Challenge (VCC) 2016. We propose to use neural network-based approaches to convert both spectral and excitation features. First, the generatively trained deep neural network (GTDNN) is adopted for spectral envelope conversion after the spectral envelopes have been pre-processed by frequency warping. Second, we propose to use a recurrent neural network (RNN) with long short-term memory (LSTM) cells for F0 trajectory conversion. In addition, we adopt a DNN for band aperiodicity conversion. Both internal tests and formal VCC evaluation results demonstrate the effectiveness of the proposed methods.

引用

页码：1642 / 1646

页数：5

共 20 条

[1]

Abe M., 1988, ICASSP 88: 1988 International Conference on Acoustics, Speech, and Signal Processing (Cat. No.88CH2561-9), P655, DOI 10.1109/ICASSP.1988.196671

[2]

[Anonymous], 2004, LREC

[3]

Chen C. J., 1997, EUROSPEECH

[4]

Chen LH, 2013, INTERSPEECH, P3051

[5] Voice Conversion Using Deep Neural Networks With Layer-Wise Generative Training [J].

Chen, Ling-Hui ;

Ling, Zhen-Hua ;

Liu, Li-Juan ;

Dai, Li-Rong .

IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2014, 22 (12) :1859-1872

[6] Foreign accent conversion in computer assisted pronunciation training [J].

Felps, Daniel ;

Bortfeld, Heather ;

Gutierrez-Osuna, Ricardo .

SPEECH COMMUNICATION, 2009, 51 (10) :920-932

[7] Deep Neural Networks for Acoustic Modeling in Speech Recognition [J].

Hinton, Geoffrey ;

Deng, Li ;

Yu, Dong ;

Dahl, George E. ;

Mohamed, Abdel-rahman ;

Jaitly, Navdeep ;

Senior, Andrew ;

Vanhoucke, Vincent ;

Patrick Nguyen ;

Sainath, Tara N. ;

Kingsbury, Brian .

IEEE SIGNAL PROCESSING MAGAZINE, 2012, 29 (06) :82-97

[8]

Kain A, 1998, INT CONF ACOUST SPEE, P285, DOI 10.1109/ICASSP.1998.674423

[9] Restructuring speech representations using a pitch-adaptive time-frequency smoothing and an instantaneous-frequency-based F0 extraction:: Possible role of a repetitive structure in sounds [J].

Kawahara, H ;

Masuda-Katsuse, I ;

de Cheveigné, A .

SPEECH COMMUNICATION, 1999, 27 (3-4) :187-207

[10]

Nakashika T, 2013, INTERSPEECH, P369

← 1 2 →