Exemplar-based voice conversion using joint nonnegative matrix factorization

被引：0

作者：

Zhizheng Wu

Eng Siong Chng

Haizhou Li

机构：

[1] Nanyang Technological University,School of Computer Engineering

[2] University of Edinburgh,Centre for Speech Technology Research

[3] Nanyang Technological University,School of Computer Engineering

[4] Nanyang Technological University,Human Language Technology Department, Institute for Infocomm Research, School of Computer Engineering

来源：

Multimedia Tools and Applications | 2015年 / 74卷

关键词：

Speech synthesis; Voice conversion; Exemplar; Sparse representation; Nonnegative matrix factorization; Joint nonnegative matrix factorization;

D O I：

暂无

中图分类号：

学科分类号：

摘要：

Exemplar-based sparse representation is a nonparametric framework for voice conversion. In this framework, a target spectrum is generated as a weighted linear combination of a set of basis spectra, namely exemplars, extracted from the training data. This framework adopts coupled source-target dictionaries consisting of acoustically aligned source-target exemplars, and assumes they can share the same activation matrix. At runtime, a source spectrogram is factorized as a product of the source dictionary and the common activation matrix, which is applied to the target dictionary to generate the target spectrogram. In practice, either low-resolution mel-scale filter bank energies or high-resolution spectra are adopted in the source dictionary. Low-resolution features are flexible in capturing the temporal information without increasing the computational cost and the memory occupation significantly, while high-resolution spectra contain significant spectral details. In this paper, we propose a joint nonnegative matrix factorization technique to find the common activation matrix using low- and high-resolution features at the same time. In this way, the common activation matrix is able to benefit from low- and high-resolution features directly. We conducted experiments on the VOICES database to evaluate the performance of the proposed method. Both objective and subjective evaluations confirmed the effectiveness of the proposed methods.

引用

页码：9943 / 9958

页数：15

共 50 条

[21] A preliminary demonstration of exemplar-based voice conversion for articulation disorders using an individuality-preserving dictionary
Ryo Aihara
Ryoichi Takashima
Tetsuya Takiguchi
Yasuo Ariki
EURASIP Journal on Audio, Speech, and Music Processing, 2014
[22] A preliminary demonstration of exemplar-based voice conversion for articulation disorders using an individuality-preserving dictionary
Aihara, Ryo
Takashima, Ryoichi
Takiguchi, Tetsuya
Ariki, Yasuo
EURASIP JOURNAL ON AUDIO SPEECH AND MUSIC PROCESSING, 2014,
[23] Locally Linear Embedding for Exemplar-Based Spectral Conversion
Wu, Yi-Chiao
Hwang, Hsin-Te
Hsu, Chin-Cheng
Tsao, Yu
Wang, Hsin-Min
17TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2016), VOLS 1-5: UNDERSTANDING SPEECH PROCESSING IN HUMANS AND MACHINES, 2016, : 1652 - 1656
[24] VOICE CONVERSION BASED ON NON-NEGATIVE MATRIX FACTORIZATION USING PHONEME-CATEGORIZED DICTIONARY
Aihara, Ryo
Nakashika, Toru
Takiguchi, Tetsuya
Ariki, Yasuo
2014 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2014,
[25] Exemplar-based Individuality-Preserving Voice Conversion for Articulation Disorders in Noisy Environments
Aihara, Ryo
Takashima, Ryoichi
Takiguchi, Tetsuya
Ariki, Yasuo
14TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2013), VOLS 1-5, 2013, : 3604 - 3608
[26] Multimodal voice conversion based on non-negative matrix factorization
Kenta Masaka
Ryo Aihara
Tetsuya Takiguchi
Yasuo Ariki
EURASIP Journal on Audio, Speech, and Music Processing, 2015
[27] Multimodal voice conversion based on non-negative matrix factorization
Masaka, Kenta
Aihara, Ryo
Takiguchi, Tetsuya
Ariki, Yasuo
EURASIP JOURNAL ON AUDIO SPEECH AND MUSIC PROCESSING, 2015,
[28] Hybrid clustering based on content and connection structure using joint nonnegative matrix factorization
Du, Rundong
Drake, Barry
Park, Haesun
JOURNAL OF GLOBAL OPTIMIZATION, 2019, 74 (04) : 861 - 877
[29] Hybrid clustering based on content and connection structure using joint nonnegative matrix factorization
Rundong Du
Barry Drake
Haesun Park
Journal of Global Optimization, 2019, 74 : 861 - 877
[30] Noise-Robust Voice Conversion Based on Sparse Spectral Mapping Using Non-negative Matrix Factorization
Aihara, Ryo
Takashima, Ryoichi
Takiguchi, Tetsuya
Ariki, Yasuo
IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2014, E97D (06): : 1411 - 1418

← 1 2 3 4 5 →