An Overview of Voice Conversion and Its Challenges: From Statistical Modeling to Deep Learning

被引:171
作者
Sisman, Berrak [1 ]
Yamagishi, Junichi [2 ,3 ]
King, Simon [4 ]
Li, Haizhou [5 ]
机构
[1] Singapore Univ Technol & Design SUTD, Informat Syst Technol & Design ISTD Pillar, Singapore 487372, Singapore
[2] Natl Inst Informat, Tokyo, Japan
[3] Univ Edinburgh, Edinburgh 1018430, Midlothian, Scotland
[4] Univ Edinburgh, Edinburgh EH8 9AB, Midlothian, Scotland
[5] Natl Univ Singapore, Dept Elect & Comp Engn, Singapore 117583, Singapore
基金
新加坡国家研究基金会;
关键词
Vocoders; Training data; Speech analysis; Deep learning; Pipelines; Speech synthesis; Training; Voice conversion; speech analysis; speaker characterization; vocoding; voice conversion evaluation; voice conversion challenges; TEXT-TO-SPEECH; GENERATIVE ADVERSARIAL NETWORKS; MAXIMUM-LIKELIHOOD-ESTIMATION; GAUSSIAN MIXTURE MODEL; ABSOLUTE ERROR MAE; SPARSE REPRESENTATION; PROCESSING TECHNIQUES; SPEAKER VERIFICATION; WAVENET VOCODER; NEURAL-NETWORKS;
D O I
10.1109/TASLP.2020.3038524
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Speaker identity is one of the important characteristics of human speech. In voice conversion, we change the speaker identity from one to another, while keeping the linguistic content unchanged. Voice conversion involves multiple speech processing techniques, such as speech analysis, spectral conversion, prosody conversion, speaker characterization, and vocoding. With the recent advances in theory and practice, we are now able to produce human-like voice quality with high speaker similarity. In this article, we provide a comprehensive overview of the state-of-the-art of voice conversion techniques and their performance evaluation methods from the statistical approaches to deep learning, and discuss their promise and limitations. We will also report the recent Voice Conversion Challenges (VCC), the performance of the current state of technology, and provide a summary of the available resources for voice conversion research.
引用
收藏
页码:132 / 157
页数:26
相关论文
共 50 条
  • [21] Distributed Deep Learning Based on Edge Computing Over Internet of Vehicles: Overview, Applications, and Challenges
    Lin, Zhuangxing
    Cui, Haixia
    Liu, Yong
    [J]. IEEE ACCESS, 2024, 12 : 133734 - 133747
  • [22] Reimagining speech: a scoping review of deep learning-based methods for non-parallel voice conversion
    Bargum, Anders R.
    Serafin, Stefania
    Erkut, Cumhur
    [J]. FRONTIERS IN SIGNAL PROCESSING, 2024, 4
  • [23] Localization-aware Deep Learning Framework for Statistical Shape Modeling Directly from Images
    Ukey, Janmesh
    Elhabian, Shireen
    [J]. MEDICAL IMAGING WITH DEEP LEARNING, VOL 227, 2023, 227 : 1910 - 1922
  • [24] Deep Learning in Acoustic Modeling for Automatic Speech Recognition and Understanding - An Overview -
    Gavat, Inge
    Militaru, Diana
    [J]. 2015 INTERNATIONAL CONFERENCE ON SPEECH TECHNOLOGY AND HUMAN-COMPUTER DIALOGUE (SPED), 2015,
  • [25] A comprehensive overview of dynamic visual SLAM and deep learning: concepts, methods and challenges
    Beghdadi, Ayman
    Mallem, Malik
    [J]. MACHINE VISION AND APPLICATIONS, 2022, 33 (04)
  • [26] A comprehensive overview of dynamic visual SLAM and deep learning: concepts, methods and challenges
    Ayman Beghdadi
    Malik Mallem
    [J]. Machine Vision and Applications, 2022, 33
  • [27] Parallel voice conversion with limited training data using stochastic variational deep kernel learning
    Jafaryani, Mohamadreza
    Sheikhzadeh, Hamid
    Pourahmadi, Vahid
    [J]. ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE, 2022, 115
  • [28] Applications of machine learning and deep learning in SPECT and PET imaging: General overview, challenges and future prospects
    Jimenez-Mesa, Carmen
    Arco, Juan E.
    Martinez-Murcia, Francisco Jesus
    Suckling, John
    Ramirez, Javier
    Gorriz, Juan Manuel
    [J]. PHARMACOLOGICAL RESEARCH, 2023, 197
  • [29] Federated Multiagent Deep Reinforcement Learning for Intelligent IoT Wireless Communications: Overview and Challenges
    De Oliveira, Hugo
    Kaneko, Megumi
    Boukhatem, Lila
    [J]. IEEE VEHICULAR TECHNOLOGY MAGAZINE, 2024, : 73 - 82
  • [30] Deep Cybersecurity: A Comprehensive Overview from Neural Network and Deep Learning Perspective
    Sarker I.H.
    [J]. SN Computer Science, 2021, 2 (3)