Dari Speech Classification Using Deep Convolutional Neural Network

被引:0
作者
Dawodi, Mursal [1 ]
Baktash, Jawid Ahamd [2 ]
Wada, Tomohisa [1 ]
Alam, Najwa [2 ]
Joya, Mohammad Zarif [2 ]
机构
[1] Univ Ryukyus, Sch Sci & Engn, Okinawa, Japan
[2] Kabul Univ, Fac Comp Sci, Kabul, Afghanistan
来源
2020 IEEE INTERNATIONAL IOT, ELECTRONICS AND MECHATRONICS CONFERENCE (IEMTRONICS 2020) | 2020年
关键词
speech classification; Dari; convolutional neural network; deep neural network; speech recognition;
D O I
暂无
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
Recently, speech recognition is one of the most advanced research topics in the world. Many recent research papers have proven the power of deep neural networks in speech recognition systems. The main purpose of this paper is to identify isolated words in Dari speech using deep learning algorithms. This research is one of the new studies in Dari speech recognition and focuses on one-word speech recognition. This collection uses our audio files as a database because there were no Dari language databases on the market at that time. In this paper, the Convolutional Neural Network (CNN) is implemented to detect automatically isolated words in Dari. Besides, it uses Mel frequency coefficients (MFCC) to learn the representation of features during training. This model achieved 88.2% in the test set. The results show that the model can predict samples of words seen during training with high accuracy. However, it is somewhat trying to generalize terms outside the scope of training data and very noisy examples.
引用
收藏
页码:110 / 113
页数:4
相关论文
共 18 条
[1]   Convolutional Neural Networks for Speech Recognition [J].
Abdel-Hamid, Ossama ;
Mohamed, Abdel-Rahman ;
Jiang, Hui ;
Deng, Li ;
Penn, Gerald ;
Yu, Dong .
IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2014, 22 (10) :1533-1545
[2]  
Amodei D, 2016, PR MACH LEARN RES, V48
[3]  
[Anonymous], 2017, HELLO EDGE KEYWORD S
[4]   Context-Dependent Pre-Trained Deep Neural Networks for Large-Vocabulary Speech Recognition [J].
Dahl, George E. ;
Yu, Dong ;
Deng, Li ;
Acero, Alex .
IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2012, 20 (01) :30-42
[5]  
Dawodi M., 2019, INFORM INT INTERDISC, V22, P241
[6]  
Dawodi M., APPL ICT DATA MINING, P13
[7]  
Dawodi M, 2019, 2019 IEEE 10TH ANNUAL INFORMATION TECHNOLOGY, ELECTRONICS AND MOBILE COMMUNICATION CONFERENCE (IEMCON), P103, DOI [10.1109/iemcon.2019.8936259, 10.1109/IEMCON.2019.8936259]
[8]  
Dawodi M, 2019, 2019 IEEE 10TH ANNUAL INFORMATION TECHNOLOGY, ELECTRONICS AND MOBILE COMMUNICATION CONFERENCE (IEMCON), P477, DOI [10.1109/iemcon.2019.8936193, 10.1109/IEMCON.2019.8936193]
[9]  
Graves A, 2013, INT CONF ACOUST SPEE, P6645, DOI 10.1109/ICASSP.2013.6638947
[10]   A Simple and Robust Persian Speech Recognition System and Its Application to Robotics [J].
Hasanabadi, H. ;
Rowhanimanesh, A. ;
Yazdi, H. Tabatabaee ;
Sharif, N. .
2008 INTERNATIONAL CONFERENCE ON ADVANCED COMPUTER THEORY AND ENGINEERING, 2008, :239-+