Voice Conversion Can Improve ASR in Very Low-Resource Settings

被引:4
作者
Baas, Matthew [1 ]
Kamper, Herman [1 ]
机构
[1] Stellenbosch Univ, MediaLab, E&E Engn, Stellenbosch, South Africa
来源
INTERSPEECH 2022 | 2022年
基金
新加坡国家研究基金会;
关键词
voice conversion; data augmentation; low-resource speech processing; speech recognition;
D O I
10.21437/Interspeech.2022-112
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Voice conversion (VC) could be used to improve speech recognition systems in low-resource languages by using it to augment limited training data. However, VC has not been widely used for this purpose because of practical issues such as compute speed and limitations when converting to and from unseen speakers. Moreover, it is still unclear whether a VC model trained on one well-resourced language can be applied to speech from another low-resource language for the aim of data augmentation. In this work we assess whether a VC system can be used cross-lingually to improve low-resource speech recognition. We combine several recent techniques to design and train a practical VC system in English, and then use this system to augment data for training speech recognition models in several low-resource languages. When using a sensible amount of VC augmented data, speech recognition performance is improved in all four low-resource languages considered. We also show that VC-based augmentation is superior to SpecAugment (a widely used signal processing augmentation method) in the low-resource languages considered.
引用
收藏
页码:3513 / 3517
页数:5
相关论文
共 39 条
  • [1] An X., 2019, ASRU
  • [2] [Anonymous], 2013, PROC ICML WORKSHOP D
  • [3] Baas M., 2020, SACAIR
  • [4] Baevski A., 2020, wav2vec 2.0: A Framework for SelfSupervised Learning of Speech Representations
  • [5] The Metamorphic Algorithm: A Speaker Mapping Approach to Data Augmentation
    Bellegarda, Jerome R.
    de Souza, Peter V.
    Nadas, Arthur
    Nahamoo, David
    Picheny, Michael A.
    Bahl, Lalit R.
    [J]. IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, 1994, 2 (03): : 413 - 420
  • [6] Automatic speech recognition for under-resourced languages: A survey
    Besacier, Laurent
    Barnard, Etienne
    Karpov, Alexey
    Schultz, Tanja
    [J]. SPEECH COMMUNICATION, 2014, 56 : 85 - 100
  • [7] Conneau A., 2020, ARXIV200613979
  • [8] Data Augmentation for Deep Neural Network Acoustic Modeling
    Cui, Xiaodong
    Goel, Vaibhava
    Kingsbury, Brian
    [J]. IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2015, 23 (09) : 1469 - 1477
  • [9] Graves A, 2012, STUD COMPUT INTELL, V385, P1, DOI [10.1007/978-3-642-24797-2, 10.1162/neco.1997.9.1.1]
  • [10] Graves A., 2006, MACHINE LEARNING P 2, P369, DOI [DOI 10.1145/1143844.1143891, 10.1145/1143844.1143891]