A MULTI-GENRE URDU BROADCAST SPEECH RECOGNITION SYSTEM

被引:3
作者
Khan, Erbaz [1 ]
Rauf, Sahar [1 ]
Adeeba, Farah [2 ]
Hussain, Sarmad [1 ]
机构
[1] Univ Engn & Technol, Al Khawarizmi Inst Comp Sci, Ctr Language Engn, Lahore, Pakistan
[2] Univ Engn & Technol, Dept Comp Sci, New Campus, Ksk, Pakistan
来源
2021 24TH CONFERENCE OF THE ORIENTAL COCOSDA INTERNATIONAL COMMITTEE FOR THE CO-ORDINATION AND STANDARDISATION OF SPEECH DATABASES AND ASSESSMENT TECHNIQUES (O-COCOSDA) | 2021年
关键词
BC; multi-genre; Urdu; corpus; speech recognition;
D O I
10.1109/O-COCOSDA202152914.2021.9660552
中图分类号
R36 [病理学]; R76 [耳鼻咽喉科学];
学科分类号
100104 ; 100213 ;
摘要
This paper reports the development of a multi-genre Urdu Broadcast (BC) corpus and a Large Vocabulary Continuous Speech Recognition (LVCSR) system. BC speech corpus of 98 hours from 453 speakers is collected and annotated. For acoustic modeling, Time-delay Neural Network (TDNN) is developed with prior Gaussian Mixture Model-Hidden Markov Model (GMM-HMM) training and alignments. For the language model, 3-gram, 4-gram and Recurrent Neural Network (RNN) based models are developed on a text corpus of 188 million words. The developed models are tested on 4.3 hours of unseen BC multi-genre speech dataset and the best Word Error Rate (WER) 18.59% is achieved using RNN based Language Model (LM). Moreover, a detailed word error analysis is carried out to compare the errors made by humans and the Automatic Speech Recognition (ASR) System. The results showed a similar behavior of word misrecognitions by both humans and ASR.
引用
收藏
页码:25 / 30
页数:6
相关论文
共 18 条
[1]  
Adeeba F., 2014, C LANG TECHN KAR
[2]  
[Anonymous], 2002, 7 INT C SPOK LANG PR
[3]  
[Anonymous], 2019, ACM WORKSH AI4TV
[4]  
Bada I., 2017, ICNLSSP INT C NATURA
[5]  
Baker J., 2006, HISTORICAL DEV FUTUR
[6]  
Farooq M. U., 2020, O COCOSDA
[7]  
Farooq M. U., 2019, INTERSPEECH
[8]  
Glenn ML, 2009, INTERSPEECH 2009: 10TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2009, VOLS 1-5, P2827
[9]  
Jurafsky Daniel, 2009, Speech and Language Processing: An Introduction to Natural Language Processing, Computational Linguistics, and Speech Recognition
[10]  
Jyothi P, 2015, 16TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2015), VOLS 1-5, P3164