Voice Conversion Can Improve ASR in Very Low-Resource Settings

被引：4

作者：

Baas, Matthew ^{[1
]}

Kamper, Herman ^{[1
]}

机构：

[1] Stellenbosch Univ, MediaLab, E&E Engn, Stellenbosch, South Africa

来源：

INTERSPEECH 2022 | 2022年

基金：

新加坡国家研究基金会;

关键词：

voice conversion; data augmentation; low-resource speech processing; speech recognition;

D O I：

10.21437/Interspeech.2022-112

中图分类号：

O42 [声学];

学科分类号：

070206 ; 082403 ;

摘要：

Voice conversion (VC) could be used to improve speech recognition systems in low-resource languages by using it to augment limited training data. However, VC has not been widely used for this purpose because of practical issues such as compute speed and limitations when converting to and from unseen speakers. Moreover, it is still unclear whether a VC model trained on one well-resourced language can be applied to speech from another low-resource language for the aim of data augmentation. In this work we assess whether a VC system can be used cross-lingually to improve low-resource speech recognition. We combine several recent techniques to design and train a practical VC system in English, and then use this system to augment data for training speech recognition models in several low-resource languages. When using a sensible amount of VC augmented data, speech recognition performance is improved in all four low-resource languages considered. We also show that VC-based augmentation is superior to SpecAugment (a widely used signal processing augmentation method) in the low-resource languages considered.

引用

页码：3513 / 3517

页数：5

共 39 条

[1] An X., 2019, ASRU
[2] [Anonymous], 2013, PROC ICML WORKSHOP D
[3] Baas M., 2020, SACAIR
[4] Baevski A., 2020, wav2vec 2.0: A Framework for SelfSupervised Learning of Speech Representations
[5] The Metamorphic Algorithm: A Speaker Mapping Approach to Data Augmentation
Bellegarda, Jerome R.
de Souza, Peter V.
Nadas, Arthur
Nahamoo, David
Picheny, Michael A.
Bahl, Lalit R.
[J]. IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, 1994, 2 (03): : 413 - 420
[6] Automatic speech recognition for under-resourced languages: A survey
Besacier, Laurent
Barnard, Etienne
Karpov, Alexey
Schultz, Tanja
[J]. SPEECH COMMUNICATION, 2014, 56 : 85 - 100
[7] Conneau A., 2020, ARXIV200613979
[8] Data Augmentation for Deep Neural Network Acoustic Modeling
Cui, Xiaodong
Goel, Vaibhava
Kingsbury, Brian
[J]. IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2015, 23 (09) : 1469 - 1477
[9] Graves A, 2012, STUD COMPUT INTELL, V385, P1, DOI [10.1007/978-3-642-24797-2, 10.1162/neco.1997.9.1.1]
[10] Graves A., 2006, MACHINE LEARNING P 2, P369, DOI [DOI 10.1145/1143844.1143891, 10.1145/1143844.1143891]

← 1 2 3 4 →