Uncertainty estimation for a speech recognition system

被引:0
作者
Morales-Munoz, Walter [1 ]
Calderon-Ramirez, Saul [1 ]
机构
[1] Inst Tecnol Costa Rica, Cartago, Costa Rica
来源
TECNOLOGIA EN MARCHA | 2024年 / 37卷
关键词
Uncertainty; Speech Recognition; ASR; Whisper; Monte Carlo Dropout;
D O I
10.18845/tm.v37i7.7305
中图分类号
O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];
学科分类号
07 ; 0710 ; 09 ;
摘要
Whisper is a voice recognition system designed by the company OpenAI, which has been trained with 680,000 hours of multilingual and multitask supervised data collected from the web. The following research aims to adapt and employ the Monte Carlo Dropout using audio data labeled in Spanish and contaminated with a certain amount of noise and Levensthein distance to estimate the score uncertainty of this system.Preliminary results show that there is a linear relationship between uncertainty estimation and the Word Error Rate (WER) of the transcriptions. Furthermore, it is observed that the number of insertions or omissions in the transcriptions tends to be low.
引用
收藏
页码:97 / 103
页数:7
相关论文
共 6 条
[1]  
Diaz C., 2022, 2022 IEEE 4 INT C BI, P1
[2]  
Gal Y, 2016, PR MACH LEARN RES, V48
[3]  
Jayashankar T, 2020, Arxiv, DOI arXiv:2006.01906
[4]   A General Framework for Uncertainty Estimation in Deep Learning [J].
Loquercio, Antonio ;
Segu, Mattia ;
Scaramuzza, Davide .
IEEE ROBOTICS AND AUTOMATION LETTERS, 2020, 5 (02) :3153-3160
[5]   A Survey on Uncertainty Estimation in Deep Learning Classification Systems from a Bayesian Perspective [J].
Mena, Jose ;
Pujol, Oriol ;
Vitria, Jordi .
ACM COMPUTING SURVEYS, 2022, 54 (09)
[6]  
Radford Alec, 2022, arXiv, DOI DOI 10.48550/ARXIV.2212.04356