An Overview of Voice Conversion and Its Challenges: From Statistical Modeling to Deep Learning

被引:171
|
作者
Sisman, Berrak [1 ]
Yamagishi, Junichi [2 ,3 ]
King, Simon [4 ]
Li, Haizhou [5 ]
机构
[1] Singapore Univ Technol & Design SUTD, Informat Syst Technol & Design ISTD Pillar, Singapore 487372, Singapore
[2] Natl Inst Informat, Tokyo, Japan
[3] Univ Edinburgh, Edinburgh 1018430, Midlothian, Scotland
[4] Univ Edinburgh, Edinburgh EH8 9AB, Midlothian, Scotland
[5] Natl Univ Singapore, Dept Elect & Comp Engn, Singapore 117583, Singapore
基金
新加坡国家研究基金会;
关键词
Vocoders; Training data; Speech analysis; Deep learning; Pipelines; Speech synthesis; Training; Voice conversion; speech analysis; speaker characterization; vocoding; voice conversion evaluation; voice conversion challenges; TEXT-TO-SPEECH; GENERATIVE ADVERSARIAL NETWORKS; MAXIMUM-LIKELIHOOD-ESTIMATION; GAUSSIAN MIXTURE MODEL; ABSOLUTE ERROR MAE; SPARSE REPRESENTATION; PROCESSING TECHNIQUES; SPEAKER VERIFICATION; WAVENET VOCODER; NEURAL-NETWORKS;
D O I
10.1109/TASLP.2020.3038524
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Speaker identity is one of the important characteristics of human speech. In voice conversion, we change the speaker identity from one to another, while keeping the linguistic content unchanged. Voice conversion involves multiple speech processing techniques, such as speech analysis, spectral conversion, prosody conversion, speaker characterization, and vocoding. With the recent advances in theory and practice, we are now able to produce human-like voice quality with high speaker similarity. In this article, we provide a comprehensive overview of the state-of-the-art of voice conversion techniques and their performance evaluation methods from the statistical approaches to deep learning, and discuss their promise and limitations. We will also report the recent Voice Conversion Challenges (VCC), the performance of the current state of technology, and provide a summary of the available resources for voice conversion research.
引用
收藏
页码:132 / 157
页数:26
相关论文
共 50 条
  • [1] Overview of Voice Conversion Methods Based on Deep Learning
    Walczyna, Tomasz
    Piotrowski, Zbigniew
    APPLIED SCIENCES-BASEL, 2023, 13 (05):
  • [2] Transfer Learning From Speech Synthesis to Voice Conversion With Non-Parallel Training Data
    Zhang, Mingyang
    Zhou, Yi
    Zhao, Li
    Li, Haizhou
    IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2021, 29 : 1290 - 1302
  • [3] An Overview of Affective Speech Synthesis and Conversion in the Deep Learning Era
    Triantafyllopoulos, Andreas
    Schuller, Bjorn W.
    Iymen, Gokce
    Sezgin, Metin
    He, Xiangheng
    Yang, Zijiang
    Tzirakis, Panagiotis
    Liu, Shuo
    Mertes, Silvan
    Andre, Elisabeth
    Fu, Ruibo
    Tao, Jianhua
    PROCEEDINGS OF THE IEEE, 2023, 111 (10) : 1355 - 1381
  • [4] Overview of Speaker Modeling and Its Applications: From the Lens of Deep Speaker Representation Learning
    Wang, Shuai
    Chen, Zhengyang
    Lee, Kong Aik
    Qian, Yanmin
    Li, Haizhou
    IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2024, 32 : 4971 - 4998
  • [5] Video scene analysis: an overview and challenges on deep learning algorithms
    Abbas, Qaisar
    Ibrahim, Mostafa E. A.
    Jaffar, M. Arfan
    MULTIMEDIA TOOLS AND APPLICATIONS, 2018, 77 (16) : 20415 - 20453
  • [6] From Reinforcement Learning to Deep Reinforcement Learning: An Overview
    Agostinelli, Forest
    Hocquet, Guillaume
    Singh, Sameer
    Baldi, Pierre
    BRAVERMAN READINGS IN MACHINE LEARNING: KEY IDEAS FROM INCEPTION TO CURRENT STATE, 2018, 11100 : 298 - 328
  • [7] Modeling language and cognition with deep unsupervised learning: a tutorial overview
    Zorzi, Marco
    Testolin, Alberto
    Stoianov, Ivilin P.
    FRONTIERS IN PSYCHOLOGY, 2013, 4
  • [8] A Survey of Community Detection Approaches: From Statistical Modeling to Deep Learning
    Jin, Di
    Yu, Zhizhi
    Jiao, Pengfei
    Pan, Shirui
    He, Dongxiao
    Wu, Jia
    Yu, Philip S.
    Zhang, Weixiong
    IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2023, 35 (02) : 1149 - 1170
  • [9] Deep Learning in Kidney Ultrasound: Overview, Frontiers, and Challenges
    De Jesus-Rodriguez, Hector J.
    Morgan, Matthew A.
    Sagreiya, Hersh
    ADVANCES IN CHRONIC KIDNEY DISEASE, 2021, 28 (03) : 262 - 269
  • [10] Deep Learning Applications in Ionospheric Modeling: Progress, Challenges, and Opportunities
    Zhang, Renzhong
    Li, Haorui
    Shen, Yunxiao
    Yang, Jiayi
    Li, Wang
    Zhao, Dongsheng
    Hu, Andong
    REMOTE SENSING, 2025, 17 (01)