Exemplar-based voice conversion using joint nonnegative matrix factorization

被引:0
作者
Zhizheng Wu
Eng Siong Chng
Haizhou Li
机构
[1] Nanyang Technological University,School of Computer Engineering
[2] University of Edinburgh,Centre for Speech Technology Research
[3] Nanyang Technological University,School of Computer Engineering
[4] Nanyang Technological University,Human Language Technology Department, Institute for Infocomm Research, School of Computer Engineering
来源
Multimedia Tools and Applications | 2015年 / 74卷
关键词
Speech synthesis; Voice conversion; Exemplar; Sparse representation; Nonnegative matrix factorization; Joint nonnegative matrix factorization;
D O I
暂无
中图分类号
学科分类号
摘要
Exemplar-based sparse representation is a nonparametric framework for voice conversion. In this framework, a target spectrum is generated as a weighted linear combination of a set of basis spectra, namely exemplars, extracted from the training data. This framework adopts coupled source-target dictionaries consisting of acoustically aligned source-target exemplars, and assumes they can share the same activation matrix. At runtime, a source spectrogram is factorized as a product of the source dictionary and the common activation matrix, which is applied to the target dictionary to generate the target spectrogram. In practice, either low-resolution mel-scale filter bank energies or high-resolution spectra are adopted in the source dictionary. Low-resolution features are flexible in capturing the temporal information without increasing the computational cost and the memory occupation significantly, while high-resolution spectra contain significant spectral details. In this paper, we propose a joint nonnegative matrix factorization technique to find the common activation matrix using low- and high-resolution features at the same time. In this way, the common activation matrix is able to benefit from low- and high-resolution features directly. We conducted experiments on the VOICES database to evaluate the performance of the proposed method. Both objective and subjective evaluations confirmed the effectiveness of the proposed methods.
引用
收藏
页码:9943 / 9958
页数:15
相关论文
共 50 条
[41]   Community Detection in Multi-Layer Networks Using Joint Nonnegative Matrix Factorization [J].
Ma, Xiaoke ;
Dong, Di ;
Wang, Quan .
IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2019, 31 (02) :273-286
[42]   Hyperspectral unmixing based on nonnegative matrix factorization [J].
Liu Xue-Song ;
Wang Bin ;
Zhang Li-Ming .
JOURNAL OF INFRARED AND MILLIMETER WAVES, 2011, 30 (01) :27-+
[43]   Vertex centrality of complex networks based on joint nonnegative matrix factorization and graph embedding [J].
Lu, Pengli ;
Chen, Wei .
CHINESE PHYSICS B, 2023, 32 (01)
[44]   Many-to-many Voice Conversion Based on Multiple Non-negative Matrix Factorization [J].
Aihara, Ryo ;
Takiguchi, Testuya ;
Ariki, Yasuo .
16TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2015), VOLS 1-5, 2015, :2749-2753
[45]   Face Recognition Using Region-Based Nonnegative Matrix Factorization [J].
Byeon, Wonmin ;
Jeon, Moongu .
COMMUNICATION AND NETWORKING, 2009, 56 :621-628
[46]   Parallel Dictionary Learning for Voice Conversion Using Discriminative Graph-embedded Non-negative Matrix Factorization [J].
Aihara, Ryo ;
Takiguchi, Tetsuya ;
Ariki, Yasuo .
17TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2016), VOLS 1-5: UNDERSTANDING SPEECH PROCESSING IN HUMANS AND MACHINES, 2016, :292-296
[47]   Exemplar-based Image Inpainting using Structure Tesnor [J].
Liu Kui ;
Tan Jieqing ;
Su Benyue .
PROCEEDINGS OF THE 2013 INTERNATIONAL CONFERENCE ON ADVANCED COMPUTER SCIENCE AND ELECTRONICS INFORMATION (ICACSEI 2013), 2013, 41 :619-623
[48]   Identification of key nodes in complex networks by using a joint technique of nonnegative matrix factorization and regularization [J].
Lu, Pengli ;
Yang, Junxia ;
Liu, Wenzhi .
PHYSICAL COMMUNICATION, 2024, 65
[49]   Deep Nonnegative Matrix Factorization with Joint Global and Local Structure Preservation [J].
Saberi-Movahed, Farid ;
Biswas, Bitasta ;
Tiwari, Prayag ;
Lehmann, Jens ;
Vahdati, Sahar .
EXPERT SYSTEMS WITH APPLICATIONS, 2024, 249
[50]   INTEGRATION OF MULTIPLE GENOMIC IMAGING DATA FOR THE STUDY OF SCHIZOPHRENIA USING JOINT NONNEGATIVE MATRIX FACTORIZATION [J].
Wang, Min ;
Huang, Ting-Zhu ;
Calhoun, Vince D. ;
Fang, Jian ;
Wang, Yu-Ping .
2017 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2017, :1083-1087