Improving Large Vocabulary Urdu Speech Recognition System using Deep Neural Networks

被引:6
|
作者
Farooq, Muhammad Umar [1 ]
Adeeba, Farah [1 ]
Rauf, Sahar [1 ]
Hussain, Sarmad [1 ]
机构
[1] Univ Engn & Technol, Ctr Language Engn, Al Khawarizmi Inst Comp Sci, Lahore, Pakistan
来源
INTERSPEECH 2019 | 2019年
关键词
Urdu; ASR; GMM-HMM; DNN-HMM; TDNN; BLSTM; RNNLM; LVCSR;
D O I
10.21437/Interspeech.2019-2629
中图分类号
R36 [病理学]; R76 [耳鼻咽喉科学];
学科分类号
100104 ; 100213 ;
摘要
Development of Large Vocabulary Continuous Speech Recognition (LVCSR) system is a cumbersome task, especially for low resource languages. Urdu is the national language and lingua franca of Pakistan, with 100 million speakers worldwide. Due to resource scarcity, limited work has been done in the domain of Urdu speech recognition. In this paper, collection of Urdu speech corpus and development of Urdu speech recognition system is presented. Urdu LVCSR is developed using 300 hours of read speech data with a vocabulary size of 199K words. Microphone speech is recorded from 1671 Urdu and Punjabi speakers in both indoor and outdoor environments. Different acoustic modeling techniques such as Gaussian Mixture Models based Hidden Markov Models (GMM-HMM), Time Delay Neural Networks (TDNN), Long-Short Term Memory (LSTM) and Bidirectional Long-Short Term Memory (BLSTM) networks are investigated. Cross entropy and Lattice Free Maximum Mutual Information (LF-MMI) objective functions are employed during acoustic modeling. In addition, Recurrent Neural Network Language Model (RNNLM) is also being used for re-scoring. Developed speech recognition system has been evaluated on 9.5 hours of collected test data and a minimum Word Error Rate (%WER) of 13.50% is achieved.
引用
收藏
页码:2978 / 2982
页数:5
相关论文
共 50 条
  • [1] Large Vocabulary Speech Recognition Using Deep Tensor Neural Networks
    Yu, Dong
    Deng, Li
    Seide, Frank
    13TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2012 (INTERSPEECH 2012), VOLS 1-3, 2012, : 6 - 9
  • [2] Large Vocabulary Speech Recognition Using Deep Neural Networks: Insights, Theory, and Practice
    Yu, Dong
    2012 8TH INTERNATIONAL SYMPOSIUM ON CHINESE SPOKEN LANGUAGE PROCESSING, 2012, : XXXI - XXXI
  • [3] Application of Pretrained Deep Neural Networks to Large Vocabulary Speech Recognition
    Jaitly, Navdeep
    Patrick Nguyen
    Senior, Andrew
    Vanhoucke, Vincent
    13TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2012 (INTERSPEECH 2012), VOLS 1-3, 2012, : 2577 - 2580
  • [4] Deep Spiking Neural Networks for Large Vocabulary Automatic Speech Recognition
    Wu, Jibin
    Yilmaz, Emre
    Zhang, Malu
    Li, Haizhou
    Tan, Kay Chen
    FRONTIERS IN NEUROSCIENCE, 2020, 14
  • [5] EXPLOITING SPARSENESS IN DEEP NEURAL NETWORKS FOR LARGE VOCABULARY SPEECH RECOGNITION
    Yu, Dong
    Seide, Frank
    Li, Gang
    Deng, Li
    2012 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2012, : 4409 - 4412
  • [6] LARGE VOCABULARY SPEECH RECOGNITION USING NEURAL-FUZZY AND CONCEPT NETWORKS
    HATAOKA, N
    AMANO, A
    ARITSUKA, T
    ICHIKAWA, A
    LECTURE NOTES IN COMPUTER SCIENCE, 1990, 412 : 186 - 196
  • [7] The Deep Tensor Neural Network With Applications to Large Vocabulary Speech Recognition
    Yu, Dong
    Deng, Li
    Seide, Frank
    IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2013, 21 (02): : 388 - 396
  • [8] A CLUSTER-BASED MULTIPLE DEEP NEURAL NETWORKS METHOD FOR LARGE VOCABULARY CONTINUOUS SPEECH RECOGNITION
    Zhou, Pan
    Liu, Cong
    Liu, Qingfeng
    Dai, Lirong
    Jiang, Hui
    2013 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2013, : 6650 - 6654
  • [9] Lexical Intent Recognition in Urdu Queries Using Deep Neural Networks
    Shams, Sana
    Aslam, Muhammad
    Maria Martinez-Enriquez, Ana
    ADVANCES IN SOFT COMPUTING, MICAI 2019, 2019, 11835 : 39 - 50
  • [10] Isolated Word Speech Recognition System Using Deep Neural Networks
    Dhanashri, Dhavale
    Dhonde, S. B.
    PROCEEDINGS OF THE INTERNATIONAL CONFERENCE ON DATA ENGINEERING AND COMMUNICATION TECHNOLOGY, ICDECT 2016, VOL 1, 2017, 468 : 9 - 17