An Overview of Voice Conversion and Its Challenges: From Statistical Modeling to Deep Learning

被引：171

作者：

Sisman, Berrak ^{[1
]}

Yamagishi, Junichi ^{[2
,3
]}

King, Simon ^{[4
]}

Li, Haizhou ^{[5
]}

机构：

[1] Singapore Univ Technol & Design SUTD, Informat Syst Technol & Design ISTD Pillar, Singapore 487372, Singapore

[2] Natl Inst Informat, Tokyo, Japan

[3] Univ Edinburgh, Edinburgh 1018430, Midlothian, Scotland

[4] Univ Edinburgh, Edinburgh EH8 9AB, Midlothian, Scotland

[5] Natl Univ Singapore, Dept Elect & Comp Engn, Singapore 117583, Singapore

来源：

IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING | 2021年 / 29卷 / 29期

基金：

新加坡国家研究基金会;

关键词：

Vocoders; Training data; Speech analysis; Deep learning; Pipelines; Speech synthesis; Training; Voice conversion; speech analysis; speaker characterization; vocoding; voice conversion evaluation; voice conversion challenges; TEXT-TO-SPEECH; GENERATIVE ADVERSARIAL NETWORKS; MAXIMUM-LIKELIHOOD-ESTIMATION; GAUSSIAN MIXTURE MODEL; ABSOLUTE ERROR MAE; SPARSE REPRESENTATION; PROCESSING TECHNIQUES; SPEAKER VERIFICATION; WAVENET VOCODER; NEURAL-NETWORKS;

D O I：

10.1109/TASLP.2020.3038524

中图分类号：

O42 [声学];

学科分类号：

070206 ; 082403 ;

摘要：

Speaker identity is one of the important characteristics of human speech. In voice conversion, we change the speaker identity from one to another, while keeping the linguistic content unchanged. Voice conversion involves multiple speech processing techniques, such as speech analysis, spectral conversion, prosody conversion, speaker characterization, and vocoding. With the recent advances in theory and practice, we are now able to produce human-like voice quality with high speaker similarity. In this article, we provide a comprehensive overview of the state-of-the-art of voice conversion techniques and their performance evaluation methods from the statistical approaches to deep learning, and discuss their promise and limitations. We will also report the recent Voice Conversion Challenges (VCC), the performance of the current state of technology, and provide a summary of the available resources for voice conversion research.

引用

页码：132 / 157

页数：26

共 50 条

[21] Distributed Deep Learning Based on Edge Computing Over Internet of Vehicles: Overview, Applications, and Challenges
Lin, Zhuangxing
Cui, Haixia
Liu, Yong
[J]. IEEE ACCESS, 2024, 12 : 133734 - 133747
[22] Reimagining speech: a scoping review of deep learning-based methods for non-parallel voice conversion
Bargum, Anders R.
Serafin, Stefania
Erkut, Cumhur
[J]. FRONTIERS IN SIGNAL PROCESSING, 2024, 4
[23] Localization-aware Deep Learning Framework for Statistical Shape Modeling Directly from Images
Ukey, Janmesh
Elhabian, Shireen
[J]. MEDICAL IMAGING WITH DEEP LEARNING, VOL 227, 2023, 227 : 1910 - 1922
[24] Deep Learning in Acoustic Modeling for Automatic Speech Recognition and Understanding - An Overview -
Gavat, Inge
Militaru, Diana
[J]. 2015 INTERNATIONAL CONFERENCE ON SPEECH TECHNOLOGY AND HUMAN-COMPUTER DIALOGUE (SPED), 2015,
[25] A comprehensive overview of dynamic visual SLAM and deep learning: concepts, methods and challenges
Beghdadi, Ayman
Mallem, Malik
[J]. MACHINE VISION AND APPLICATIONS, 2022, 33 (04)
[26] A comprehensive overview of dynamic visual SLAM and deep learning: concepts, methods and challenges
Ayman Beghdadi
Malik Mallem
[J]. Machine Vision and Applications, 2022, 33
[27] Parallel voice conversion with limited training data using stochastic variational deep kernel learning
Jafaryani, Mohamadreza
Sheikhzadeh, Hamid
Pourahmadi, Vahid
[J]. ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE, 2022, 115
[28] Applications of machine learning and deep learning in SPECT and PET imaging: General overview, challenges and future prospects
Jimenez-Mesa, Carmen
Arco, Juan E.
Martinez-Murcia, Francisco Jesus
Suckling, John
Ramirez, Javier
Gorriz, Juan Manuel
[J]. PHARMACOLOGICAL RESEARCH, 2023, 197
[29] Federated Multiagent Deep Reinforcement Learning for Intelligent IoT Wireless Communications: Overview and Challenges
De Oliveira, Hugo
Kaneko, Megumi
Boukhatem, Lila
[J]. IEEE VEHICULAR TECHNOLOGY MAGAZINE, 2024, : 73 - 82
[30] Deep Cybersecurity: A Comprehensive Overview from Neural Network and Deep Learning Perspective
Sarker I.H.
[J]. SN Computer Science, 2021, 2 (3)

← 1 2 3 4 5 →