Parallel Dictionary Learning for Voice Conversion Using Discriminative Graph-embedded Non-negative Matrix Factorization

被引：7

作者：

Aihara, Ryo ^{[1
]}

Takiguchi, Tetsuya ^{[1
]}

Ariki, Yasuo ^{[1
]}

机构：

[1] Kobe Univ, Grad Sch Syst Informat, Nada Ku, 1-1 Rokkodai, Kobe, Hyogo, Japan

来源：

17TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2016), VOLS 1-5: UNDERSTANDING SPEECH PROCESSING IN HUMANS AND MACHINES | 2016年

关键词：

voice conversion; speech synthesis; NMF; spare representation; SPARSE REPRESENTATION; ALGORITHMS;

D O I：

10.21437/Interspeech.2016-227

中图分类号：

O42 [声学];

学科分类号：

070206 ; 082403 ;

摘要：

This paper proposes a discriminative learning method for Non negative Matrix Factorization (NMF)-based Voice Conversion (VC). NMF-based VC has been researched because of the natural-sounding voice it produces compared with conventional Gaussian Mixture Model (GMM)-based VC. In conventional NMF-based VC, parallel exemplars are used as the dictionary; therefore, dictionary learning is not adopted. In order to enhance the conversion quality of NMF-based VC, we propose Discriminative Graph-embedded Non-negative Matrix Factorization (DGNMF). Parallel dictionaries of the source and target speakers are discriminatively estimated by using DGNMF based on the phoneme labels of the training data. Experimental results show that our proposed method can not only improve the conversion quality but also reduce the computational times.

引用

页码：292 / 296

页数：5

共 26 条

[1]

Aihara R., 2016, IEEE ACM T AUDIO SPE

[2]

Aihara R, 2014, P ICASSP, P7944

[3]

AIHARA R, 2015, P ICASSP, P4899

[4] Noise-Robust Voice Conversion Based on Sparse Spectral Mapping Using Non-negative Matrix Factorization [J].

Aihara, Ryo ;

Takashima, Ryoichi ;

Takiguchi, Tetsuya ;

Ariki, Yasuo .

IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2014, E97D (06) :1411-1418

[5]

[Anonymous], 2006, P INTERSPEECH

[6]

[Anonymous], 2009, NONNEGATIVE MATRIX T

[7] Algorithms and applications for approximate nonnegative matrix factorization [J].

Berry, Michael W. ;

Browne, Murray ;

Langville, Amy N. ;

Pauca, V. Paul ;

Plemmons, Robert J. .

COMPUTATIONAL STATISTICS & DATA ANALYSIS, 2007, 52 (01) :155-173

[8]

Cai D., 2010, IEEE T PATTERN ANAL, V33

[9] Exemplar-Based Sparse Representations for Noise Robust Automatic Speech Recognition [J].

Gemmeke, Jort F. ;

Virtanen, Tuomas ;

Hurmalainen, Antti .

IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2011, 19 (07) :2067-2080

[10] Voice Conversion Using Partial Least Squares Regression [J].

Helander, Elina ;

Virtanen, Tuomas ;

Nurminen, Jani ;

Gabbouj, Moncef .

IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2010, 18 (05) :912-921

← 1 2 3 →