MEASURING THE IMPACT OF DOMAIN FACTORS IN SELF-SUPERVISED PRE-TRAINING

被引:3
作者
Sanabria, Ramon [1 ]
Wei-Ning, Hsu [2 ]
Alexei, Baevski [2 ]
Auli, Michael [2 ]
机构
[1] Univ Edinburgh, Edinburgh, Midlothian, Scotland
[2] Meta AI, New York, NY USA
来源
2023 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING WORKSHOPS, ICASSPW | 2023年
关键词
speech recognition; self-supervised learning; domain mismatch;
D O I
10.1109/ICASSPW59220.2023.10193184
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Human speech data comprises a rich set of domain factors such as accent, syntactic and semantic variety, or acoustic environment. Previous work explores the effect of domain mismatch in automatic speech recognition between pre-training and fine-tuning as a whole [1] but does not dissect the contribution of individual factors. In this paper, we present a controlled study to better understand the effect of such factors on the performance of pre-trained representations on automatic speech recognition. To do so, we pre-train models either on modified natural speech or synthesized audio, with a single domain factor modified, and then measure performance after fine-tuning. Results show that phonetic domain factors play an important role during pre-training while grammatical and syntactic factors are far less important. To our knowledge, this is the first study to better understand the domain characteristics of pre-trained sets in self-supervised pre-training for speech.
引用
收藏
页数:5
相关论文
共 25 条
[1]  
[Anonymous], 2017, CSTR VCTK CORPUS ENG
[2]  
[Anonymous], 2004, LREC
[3]  
Baevski A, 2020, ADV NEUR IN, V33
[4]  
Berrebbi D., 2023, ICASSP
[5]   An Unsupervised Autoregressive Model for Speech Representation Learning [J].
Chung, Yu-An ;
Hsu, Wei-Ning ;
Tang, Hao ;
Glass, James .
INTERSPEECH 2019, 2019, :146-150
[6]  
Conneau Alexis, 2020, INTERSPEECH
[7]  
github, MONTR FORC AL
[8]  
github, 2021, WEIGHTS US SYNTH SPE
[9]  
Godfrey J. J., 1992, ICASSP-92: 1992 IEEE International Conference on Acoustics, Speech and Signal Processing (Cat. No.92CH3103-9), P517, DOI 10.1109/ICASSP.1992.225858
[10]  
Graves A., 2006, P 23 INT C MACHINE L, P369, DOI DOI 10.1145/1143844.1143891