An Overview of Voice Conversion and Its Challenges: From Statistical Modeling to Deep Learning

被引：171

作者：

Sisman, Berrak ^{[1
]}

Yamagishi, Junichi ^{[2
,3
]}

King, Simon ^{[4
]}

Li, Haizhou ^{[5
]}

机构：

[1] Singapore Univ Technol & Design SUTD, Informat Syst Technol & Design ISTD Pillar, Singapore 487372, Singapore

[2] Natl Inst Informat, Tokyo, Japan

[3] Univ Edinburgh, Edinburgh 1018430, Midlothian, Scotland

[4] Univ Edinburgh, Edinburgh EH8 9AB, Midlothian, Scotland

[5] Natl Univ Singapore, Dept Elect & Comp Engn, Singapore 117583, Singapore

来源：

IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING | 2021年 / 29卷 / 29期

基金：

新加坡国家研究基金会;

关键词：

Vocoders; Training data; Speech analysis; Deep learning; Pipelines; Speech synthesis; Training; Voice conversion; speech analysis; speaker characterization; vocoding; voice conversion evaluation; voice conversion challenges; TEXT-TO-SPEECH; GENERATIVE ADVERSARIAL NETWORKS; MAXIMUM-LIKELIHOOD-ESTIMATION; GAUSSIAN MIXTURE MODEL; ABSOLUTE ERROR MAE; SPARSE REPRESENTATION; PROCESSING TECHNIQUES; SPEAKER VERIFICATION; WAVENET VOCODER; NEURAL-NETWORKS;

D O I：

10.1109/TASLP.2020.3038524

中图分类号：

O42 [声学];

学科分类号：

070206 ; 082403 ;

摘要：

Speaker identity is one of the important characteristics of human speech. In voice conversion, we change the speaker identity from one to another, while keeping the linguistic content unchanged. Voice conversion involves multiple speech processing techniques, such as speech analysis, spectral conversion, prosody conversion, speaker characterization, and vocoding. With the recent advances in theory and practice, we are now able to produce human-like voice quality with high speaker similarity. In this article, we provide a comprehensive overview of the state-of-the-art of voice conversion techniques and their performance evaluation methods from the statistical approaches to deep learning, and discuss their promise and limitations. We will also report the recent Voice Conversion Challenges (VCC), the performance of the current state of technology, and provide a summary of the available resources for voice conversion research.

引用

页码：132 / 157

页数：26

共 50 条

[1] Overview of Voice Conversion Methods Based on Deep Learning
Walczyna, Tomasz
Piotrowski, Zbigniew
APPLIED SCIENCES-BASEL, 2023, 13 (05):
[2] Transfer Learning From Speech Synthesis to Voice Conversion With Non-Parallel Training Data
Zhang, Mingyang
Zhou, Yi
Zhao, Li
Li, Haizhou
IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2021, 29 : 1290 - 1302
[3] An Overview of Affective Speech Synthesis and Conversion in the Deep Learning Era
Triantafyllopoulos, Andreas
Schuller, Bjorn W.
Iymen, Gokce
Sezgin, Metin
He, Xiangheng
Yang, Zijiang
Tzirakis, Panagiotis
Liu, Shuo
Mertes, Silvan
Andre, Elisabeth
Fu, Ruibo
Tao, Jianhua
PROCEEDINGS OF THE IEEE, 2023, 111 (10) : 1355 - 1381
[4] Overview of Speaker Modeling and Its Applications: From the Lens of Deep Speaker Representation Learning
Wang, Shuai
Chen, Zhengyang
Lee, Kong Aik
Qian, Yanmin
Li, Haizhou
IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2024, 32 : 4971 - 4998
[5] Video scene analysis: an overview and challenges on deep learning algorithms
Abbas, Qaisar
Ibrahim, Mostafa E. A.
Jaffar, M. Arfan
MULTIMEDIA TOOLS AND APPLICATIONS, 2018, 77 (16) : 20415 - 20453
[6] From Reinforcement Learning to Deep Reinforcement Learning: An Overview
Agostinelli, Forest
Hocquet, Guillaume
Singh, Sameer
Baldi, Pierre
BRAVERMAN READINGS IN MACHINE LEARNING: KEY IDEAS FROM INCEPTION TO CURRENT STATE, 2018, 11100 : 298 - 328
[7] Modeling language and cognition with deep unsupervised learning: a tutorial overview
Zorzi, Marco
Testolin, Alberto
Stoianov, Ivilin P.
FRONTIERS IN PSYCHOLOGY, 2013, 4
[8] A Survey of Community Detection Approaches: From Statistical Modeling to Deep Learning
Jin, Di
Yu, Zhizhi
Jiao, Pengfei
Pan, Shirui
He, Dongxiao
Wu, Jia
Yu, Philip S.
Zhang, Weixiong
IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2023, 35 (02) : 1149 - 1170
[9] Deep Learning in Kidney Ultrasound: Overview, Frontiers, and Challenges
De Jesus-Rodriguez, Hector J.
Morgan, Matthew A.
Sagreiya, Hersh
ADVANCES IN CHRONIC KIDNEY DISEASE, 2021, 28 (03) : 262 - 269
[10] Deep Learning Applications in Ionospheric Modeling: Progress, Challenges, and Opportunities
Zhang, Renzhong
Li, Haorui
Shen, Yunxiao
Yang, Jiayi
Li, Wang
Zhao, Dongsheng
Hu, Andong
REMOTE SENSING, 2025, 17 (01)

← 1 2 3 4 5 →