Building and benchmarking an Arabic Speech Commands dataset for small-footprint keyword spotting

被引:11
作者
Ghandoura, Abdulkader [1 ]
Hjabo, Farouk [2 ]
Al Dakkak, Oumayma [1 ,3 ]
机构
[1] Syrian Virtual Univ, Damascus, Syria
[2] Innopolis Univ, Innopolis, Russia
[3] Higher Inst Appl Sci & Technol, Damascus, Syria
关键词
Arabic Speech dataset; Speech recognition; Keyword spotting; Machine learning; Deep learning;
D O I
10.1016/j.engappai.2021.104267
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
The introduction of the Google Speech Commands dataset accelerated research and resulted in a variety of new deep learning approaches that address keyword spotting tasks. The main contribution of this work is the building of an Arabic Speech Commands dataset, a counterpart to Google's dataset. Our dataset consists of 12000 instances, collected from 30 contributors, and grouped into 40 keywords. We also report different experiments to benchmark this dataset using classical machine learning and deep learning approaches, the best of which is a Convolutional Neural Network with Mel-Frequency Cepstral Coefficients that achieved an accuracy of similar to 98%. Additionally, we point out some key ideas to be considered in such tasks.
引用
收藏
页数:7
相关论文
共 24 条
[1]  
[Anonymous], 2015, ARXIV PREPRINT ARXIV
[2]  
Benamer L, 2020, P 3 C ENG SCI TECHN
[3]   GLOBAL OPTIMIZATION OF A NEURAL NETWORK-HIDDEN MARKOV MODEL HYBRID [J].
BENGIO, Y ;
DEMORI, R ;
FLAMMIA, G ;
KOMPE, R .
IEEE TRANSACTIONS ON NEURAL NETWORKS, 1992, 3 (02) :252-259
[4]  
Chen G., 2014, P IEEE INT C AC SPEE, P4087, DOI 10.1109/ICASSP.2014.6854370
[5]   COMPARISON OF PARAMETRIC REPRESENTATIONS FOR MONOSYLLABIC WORD RECOGNITION IN CONTINUOUSLY SPOKEN SENTENCES [J].
DAVIS, SB ;
MERMELSTEIN, P .
IEEE TRANSACTIONS ON ACOUSTICS SPEECH AND SIGNAL PROCESSING, 1980, 28 (04) :357-366
[6]  
de Andrade D. C., 2018, ARXIV PREPRINT ARXIV
[7]  
Deng J, 2009, PROC CVPR IEEE, P248, DOI 10.1109/CVPRW.2009.5206848
[8]  
Geron A., 2017, HANDS MACHINE LEARNI
[9]  
Goodfellow I, 2016, ADAPT COMPUT MACH LE, P1
[10]  
Graves A, 2012, STUD COMPUT INTELL, V385, P1, DOI [10.1162/neco.1997.9.1.1, 10.1007/978-3-642-24797-2]