Svarah: Evaluating English ASR Systems on Indian Accents

被引:1
作者
Javed, Tahir [1 ,2 ]
Joshi, Sakshi [2 ]
Nagarajan, Vignesh [2 ]
Sundaresan, Sai [2 ]
Nawale, Janki [2 ]
Raman, Abhigyan [2 ]
Bhogale, Kaushal [2 ]
Kumar, Pratyush [2 ,3 ]
Khapra, Mitesh M. [2 ]
机构
[1] Indian Inst Technol Madras, Chennai, India
[2] AI4Bharat, Chennai, India
[3] Microsoft, Redmond, WA USA
来源
INTERSPEECH 2023 | 2023年
关键词
non-native speech recognition; Indian accents; diversity; and inclusion;
D O I
10.21437/Interspeech.2023-2588
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
India is the second largest English-speaking country in the world with a speaker base of roughly 130 million. Thus, it is imperative that automatic speech recognition (ASR) systems for English should be evaluated on Indian accents. Unfortunately, Indian speakers find a very poor representation in existing English ASR benchmarks such as LibriSpeech, Switchboard, Speech Accent Archive, etc. In this work, we address this gap by creating Svarah, a benchmark that contains 9.6 hours of transcribed English audio from 117 speakers across 65 geographic locations throughout India, resulting in a diverse range of accents. Svarah comprises both read speech and spontaneous conversational data, covering various domains, such as history, culture, tourism, etc., ensuring a diverse vocabulary. We evaluate 6 open source ASR models and 2 commercial ASR systems on Svarah and show that there is clear scope for improvement on Indian accents.
引用
收藏
页码:5087 / 5091
页数:5
相关论文
empty
未找到相关数据