A Survey of Recent DNN Architectures on the TIMIT Phone Recognition Task

被引:9
作者
Michalek, Josef [1 ]
Vanek, Jan [1 ]
机构
[1] Univ West Bohemia, Univ 8, Plzen 30100, Czech Republic
来源
TEXT, SPEECH, AND DIALOGUE (TSD 2018) | 2018年 / 11107卷
关键词
Neural networks; Acoustic model; Survey; Review; TIMIT; LSTM; Phone recognition; NETWORKS;
D O I
10.1007/978-3-030-00794-2_47
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In this survey paper, we have evaluated several recent deep neural network (DNN) architectures on a TIMIT phone recognition task. We chose the TIMIT corpus due to its popularity and broad availability in the community. It also simulates a low-resource scenario that is helpful in minor languages. Also, we prefer the phone recognition task because it is much more sensitive to an acoustic model quality than a large vocabulary continuous speech recognition (LVCSR) task. In recent years, many DNN published papers reported results on TIMIT. However, the reported phone error rates (PERs) were often much higher than a PER of a simple feed-forward (FF) DNN. That was the main motivation of this paper: To provide a baseline DNNs with open-source scripts to easily replicate the baseline results for future papers with lowest possible PERs. According to our knowledge, the best-achieved PER of this survey is better than the best-published PER to date.
引用
收藏
页码:436 / 444
页数:9
相关论文
共 11 条
  • [1] Recurrent DNNs and Its Ensembles on the TIMIT Phone Recognition Task
    Vanek, Jan
    Michalek, Josef
    Psutka, Josef
    SPEECH AND COMPUTER (SPECOM 2018), 2018, 11096 : 728 - 736
  • [2] A NEW TIMIT BENCHMARK FOR CONTEXT-INDEPENDENT PHONE RECOGNITION USING TURBO FUSION
    Lohrenz, Timo
    Li, Wei
    Fingscheidt, Tim
    2018 IEEE WORKSHOP ON SPOKEN LANGUAGE TECHNOLOGY (SLT 2018), 2018, : 498 - 505
  • [3] Kaldi-based DNN Architectures for Speech Recognition in Romanian
    Georgescu, Alexandru-Lucian
    Cucu, Horia
    Burileanu, Corneliu
    2019 10TH INTERNATIONAL CONFERENCE ON SPEECH TECHNOLOGY AND HUMAN-COMPUTER DIALOGUE (SPED), 2019,
  • [4] Combining formant frequency based on variable order LPC coding with acoustic features for TIMIT phone recognition
    Messaoud, Zaineb
    Hamida, Ahmed
    INTERNATIONAL JOURNAL OF SPEECH TECHNOLOGY, 2011, 14 (04) : 393 - 403
  • [5] A survey of the recent architectures of deep convolutional neural networks
    Khan, Asifullah
    Sohail, Anabia
    Zahoora, Umme
    Qureshi, Aqsa Saeed
    ARTIFICIAL INTELLIGENCE REVIEW, 2020, 53 (08) : 5455 - 5516
  • [6] Recent Advances on Singlemodal and Multimodal Face Recognition: A Survey
    Zhou, Hailing
    Mian, Ajmal
    Wei, Lei
    Creighton, Doug
    Hossny, Mo
    Nahavandi, Saeid
    IEEE TRANSACTIONS ON HUMAN-MACHINE SYSTEMS, 2014, 44 (06) : 701 - 716
  • [7] Emotion Recognition from Natural Phone Conversations in Individuals With and Without Recent Suicidal Ideation
    Gideon, John
    Schatten, Heather T.
    McInnis, Melvin G.
    Provost, Emily Mower
    INTERSPEECH 2019, 2019, : 3282 - 3286
  • [8] Short-time Viterbi for online HMM decoding: Evaluation on a real-time phone recognition task
    Bloit, Julien
    Rodet, Xavier
    2008 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING, VOLS 1-12, 2008, : 2121 - 2124
  • [9] Efficient Visual Recognition: A Survey on Recent Advances and Brain-inspired Methodologies
    Yang Wu
    Ding-Heng Wang
    Xiao-Tong Lu
    Fan Yang
    Man Yao
    Wei-Sheng Dong
    Jian-Bo Shi
    Guo-Qi Li
    Machine Intelligence Research, 2022, 19 : 366 - 411
  • [10] Efficient Visual Recognition: A Survey on Recent Advances and Brain-inspired Methodologies
    Wu, Yang
    Wang, Ding-Heng
    Lu, Xiao-Tong
    Yang, Fan
    Yao, Man
    Dong, Wei-Sheng
    Shi, Jian-Bo
    Li, Guo-Qi
    MACHINE INTELLIGENCE RESEARCH, 2022, 19 (05) : 366 - 411