Multimodal voice conversion based on non-negative matrix factorization

被引:0
|
作者
Kenta Masaka
Ryo Aihara
Tetsuya Takiguchi
Yasuo Ariki
机构
[1] Kobe University,Graduate School of System Informatics
[2] Kobe University,Organization of Advanced Science and Technology
来源
EURASIP Journal on Audio, Speech, and Music Processing | / 2015卷
关键词
Voice conversion; Multimodal; Image features; Non-negative matrix factorization; Noise robustness;
D O I
暂无
中图分类号
学科分类号
摘要
A multimodal voice conversion (VC) method for noisy environments is proposed. In our previous non-negative matrix factorization (NMF)-based VC method, source and target exemplars are extracted from parallel training data, in which the same texts are uttered by the source and target speakers. The input source signal is then decomposed into source exemplars, noise exemplars, and their weights. Then, the converted speech is constructed from the target exemplars and the weights related to the source exemplars. In this study, we propose multimodal VC that improves the noise robustness of our NMF-based VC method. Furthermore, we introduce the combination weight between audio and visual features and formulate a new cost function to estimate audio-visual exemplars. Using the joint audio-visual features as source features, VC performance is improved compared with that of a previous audio-input exemplar-based VC method. The effectiveness of the proposed method is confirmed by comparing its effectiveness with that of a conventional audio-input NMF-based method and a Gaussian mixture model-based method.
引用
收藏
相关论文
共 50 条
  • [21] Parallel Dictionary Learning for Voice Conversion Using Discriminative Graph-embedded Non-negative Matrix Factorization
    Aihara, Ryo
    Takiguchi, Tetsuya
    Ariki, Yasuo
    17TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2016), VOLS 1-5: UNDERSTANDING SPEECH PROCESSING IN HUMANS AND MACHINES, 2016, : 292 - 296
  • [22] Human Action Recognition Based on Non-negative Matrix Factorization
    Lin, Chih-Yang
    Chen, Bo-You
    Wu, Wen-Chuan
    Lin, Wei-Yang
    Tsai, Chia-Ling
    2015 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA), 2015, : 1091 - 1093
  • [23] Non-negative matrix factorization based methods for object recognition
    Liu, WX
    Zheng, NN
    PATTERN RECOGNITION LETTERS, 2004, 25 (08) : 893 - 897
  • [24] Face image analysis based on non-negative matrix factorization
    Liu Cuixiang
    Zhang Yan
    Yu Ming
    Proceedings of the First International Symposium on Test Automation & Instrumentation, Vols 1 - 3, 2006, : 388 - 391
  • [25] Clustering-based initialization for non-negative matrix factorization
    Xue, Yun
    Tong, Chong Sze
    Chen, Ying
    Chen, Wen-Sheng
    APPLIED MATHEMATICS AND COMPUTATION, 2008, 205 (02) : 525 - 536
  • [26] Non-Negative Matrix Factorization Based Compensation of Music for Automatic Speech Recognition
    Raj, Bhiksha
    Virtanen, Tuomas
    Chaudhuri, Sourish
    Singh, Rita
    11TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2010 (INTERSPEECH 2010), VOLS 1-2, 2010, : 717 - +
  • [27] Optimization and expansion of non-negative matrix factorization
    Lin, Xihui
    Boutros, Paul C.
    BMC BIOINFORMATICS, 2020, 21 (01)
  • [28] Novel Algorithm for Non-Negative Matrix Factorization
    Tran Dang Hien
    Do Van Tuan
    Pham Van At
    Le Hung Son
    NEW MATHEMATICS AND NATURAL COMPUTATION, 2015, 11 (02) : 121 - 133
  • [29] Optimization and expansion of non-negative matrix factorization
    Xihui Lin
    Paul C. Boutros
    BMC Bioinformatics, 21
  • [30] FARNESS PRESERVING NON-NEGATIVE MATRIX FACTORIZATION
    Babaee, Mohammadreza
    Bahmanyar, Reza
    Rigoll, Gerhard
    Datcu, Mihai
    2014 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING (ICIP), 2014, : 3023 - 3027