Arabic Speech Classification Method Based on Padding and Deep Learning Neural Network

被引:13
作者
Asroni, Asroni [1 ]
Ku-Mahamud, Ku Ruhana [2 ]
Damarjati, Cahya [1 ]
Slamat, Hasan Basri [1 ]
机构
[1] Univ Muhammadiyah Yogyakarta, Yogyakarta, Indonesia
[2] Univ Utara Malaysia, Sintok, Kedah, Malaysia
关键词
Arabic alphabet; COVID-19; Deep learning; Spectrogram; Speech classification;
D O I
10.21123/bsj.2021.18.2(Suppl.).0925
中图分类号
O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];
学科分类号
07 ; 0710 ; 09 ;
摘要
Deep learning convolution neural network has been widely used to recognize or classify voice. Various techniques have been used together with convolution neural network to prepare voice data before the training process in developing the classification model. However, not all model can produce good classification accuracy as there are many types of voice or speech. Classification of Arabic alphabet pronunciation is a one of the types of voice and accurate pronunciation is required in the learning of the Qur'an reading. Thus, the technique to process the pronunciation and training of the processed data requires specific approach. To overcome this issue, a method based on padding and deep learning convolution neural network is proposed to evaluate the pronunciation of the Arabic alphabet. Voice data from six school children are recorded and used to test the performance of the proposed method. The padding technique has been used to augment the voice data before feeding the data to the CNN structure to developed the classification model. In addition, three other feature extraction techniques have been introduced to enable the comparison of the proposed method which employs padding technique. The performance of the proposed method with padding technique is at par with the spectrogram but better than mel-spectrogram and mel-frequency cepstral coefficients. Results also show that the proposed method was able to distinguish the Arabic alphabets that are difficult to pronounce. The proposed method with padding technique may be extended to address other voice pronunciation ability other than the Arabic alphabets.
引用
收藏
页码:925 / 936
页数:12
相关论文
共 28 条
[1]  
Adhayani A, 2015, J ALGORITMA, V12, P264
[2]  
Aiquan Yuan, 2012, Proceedings of the 10th IAPR International Workshop on Document Analysis Systems (DAS 2012), P125, DOI 10.1109/DAS.2012.61
[3]  
Almanfaluti IK, 2020, J MEDIA INFORM BUDID, V4, P22, DOI [10.30865/mib.v4i1.1793, DOI 10.30865/MIB.V4I1.1793]
[4]  
Anwar K., 2018, THESIS U ISLAM NEGER
[5]   Classifying environmental sounds using image recognition networks [J].
Boddapati, Venkatesh ;
Petef, Andrej ;
Rasmusson, Jim ;
Lundberg, Lars .
KNOWLEDGE-BASED AND INTELLIGENT INFORMATION & ENGINEERING SYSTEMS, 2017, 112 :2048-2056
[6]  
Borsky M, 2017, IEEE-ACM T AUDIO SPE, V25, P2281, DOI [10.1109/TASLP.2017.2759002, 10.1109/taslp.2017.2759002]
[7]  
Coates A., 2011, An analysis of single-layer networks in unsupervised feature learning
[8]  
Efendi R, 2015, J PSEUDOCODE, V2, P124, DOI [10.33369/pseudocode.2.2.124-134, DOI 10.33369/PSEUDOCODE.2.2.124-134]
[9]  
El-Alami FZ, 2020, J INF COMMUN TECHNOL, V19, P381
[10]   Semantic-based padding in convolutional neural networks for improving the performance in natural language processing. A case of study in sentiment analysis [J].
Gimenez, Maite ;
Palanca, Javier ;
Botti, Vicent .
NEUROCOMPUTING, 2020, 378 :315-323