A Survey of Recent DNN Architectures on the TIMIT Phone Recognition Task

被引：9

作者：

Michalek, Josef ^{[1
]}

Vanek, Jan ^{[1
]}

机构：

[1] Univ West Bohemia, Univ 8, Plzen 30100, Czech Republic

来源：

TEXT, SPEECH, AND DIALOGUE (TSD 2018) | 2018年 / 11107卷

关键词：

Neural networks; Acoustic model; Survey; Review; TIMIT; LSTM; Phone recognition; NETWORKS;

D O I：

10.1007/978-3-030-00794-2_47

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

In this survey paper, we have evaluated several recent deep neural network (DNN) architectures on a TIMIT phone recognition task. We chose the TIMIT corpus due to its popularity and broad availability in the community. It also simulates a low-resource scenario that is helpful in minor languages. Also, we prefer the phone recognition task because it is much more sensitive to an acoustic model quality than a large vocabulary continuous speech recognition (LVCSR) task. In recent years, many DNN published papers reported results on TIMIT. However, the reported phone error rates (PERs) were often much higher than a PER of a simple feed-forward (FF) DNN. That was the main motivation of this paper: To provide a baseline DNNs with open-source scripts to easily replicate the baseline results for future papers with lowest possible PERs. According to our knowledge, the best-achieved PER of this survey is better than the best-published PER to date.

引用

页码：436 / 444

页数：9

共 11 条

[1] Recurrent DNNs and Its Ensembles on the TIMIT Phone Recognition Task
Vanek, Jan
Michalek, Josef
Psutka, Josef
SPEECH AND COMPUTER (SPECOM 2018), 2018, 11096 : 728 - 736
[2] A NEW TIMIT BENCHMARK FOR CONTEXT-INDEPENDENT PHONE RECOGNITION USING TURBO FUSION
Lohrenz, Timo
Li, Wei
Fingscheidt, Tim
2018 IEEE WORKSHOP ON SPOKEN LANGUAGE TECHNOLOGY (SLT 2018), 2018, : 498 - 505
[3] Kaldi-based DNN Architectures for Speech Recognition in Romanian
Georgescu, Alexandru-Lucian
Cucu, Horia
Burileanu, Corneliu
2019 10TH INTERNATIONAL CONFERENCE ON SPEECH TECHNOLOGY AND HUMAN-COMPUTER DIALOGUE (SPED), 2019,
[4] Combining formant frequency based on variable order LPC coding with acoustic features for TIMIT phone recognition
Messaoud, Zaineb
Hamida, Ahmed
INTERNATIONAL JOURNAL OF SPEECH TECHNOLOGY, 2011, 14 (04) : 393 - 403
[5] A survey of the recent architectures of deep convolutional neural networks
Khan, Asifullah
Sohail, Anabia
Zahoora, Umme
Qureshi, Aqsa Saeed
ARTIFICIAL INTELLIGENCE REVIEW, 2020, 53 (08) : 5455 - 5516
[6] Recent Advances on Singlemodal and Multimodal Face Recognition: A Survey
Zhou, Hailing
Mian, Ajmal
Wei, Lei
Creighton, Doug
Hossny, Mo
Nahavandi, Saeid
IEEE TRANSACTIONS ON HUMAN-MACHINE SYSTEMS, 2014, 44 (06) : 701 - 716
[7] Emotion Recognition from Natural Phone Conversations in Individuals With and Without Recent Suicidal Ideation
Gideon, John
Schatten, Heather T.
McInnis, Melvin G.
Provost, Emily Mower
INTERSPEECH 2019, 2019, : 3282 - 3286
[8] Short-time Viterbi for online HMM decoding: Evaluation on a real-time phone recognition task
Bloit, Julien
Rodet, Xavier
2008 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING, VOLS 1-12, 2008, : 2121 - 2124
[9] Efficient Visual Recognition: A Survey on Recent Advances and Brain-inspired Methodologies
Yang Wu
Ding-Heng Wang
Xiao-Tong Lu
Fan Yang
Man Yao
Wei-Sheng Dong
Jian-Bo Shi
Guo-Qi Li
Machine Intelligence Research, 2022, 19 : 366 - 411
[10] Efficient Visual Recognition: A Survey on Recent Advances and Brain-inspired Methodologies
Wu, Yang
Wang, Ding-Heng
Lu, Xiao-Tong
Yang, Fan
Yao, Man
Dong, Wei-Sheng
Shi, Jian-Bo
Li, Guo-Qi
MACHINE INTELLIGENCE RESEARCH, 2022, 19 (05) : 366 - 411

← 1 2 →