Automated Amharic News Categorization Using Deep Learning Models

被引:10
作者
Endalie, Demeke [1 ]
Haile, Getamesay [1 ]
机构
[1] Jimma Inst Technol, Fac Comp & Informat, Jimma, Ethiopia
关键词
Multilayer neural networks;
D O I
10.1155/2021/3774607
中图分类号
Q [生物科学];
学科分类号
07 ; 0710 ; 09 ;
摘要
For decades, machine learning techniques have been used to process Amharic texts. The potential application of deep learning on Amharic document classification has not been exploited due to a lack of language resources. In this paper, we present a deep learning model for Amharic news document classification. The proposed model uses fastText to generate text vectors to represent semantic meaning of texts and solve the problem of traditional methods. The text vectors matrix is then fed into the embedding layer of a convolutional neural network (CNN), which automatically extracts features. We conduct experiments on a data set with six news categories, and our approach produced a classification accuracy of 93.79%. We compared our method to well-known machine learning algorithms such as support vector machine (SVM), multilayer perceptron (MLP), decision tree (DT), XGBoost (XGB), and random forest (RF) and achieved good results.
引用
收藏
页数:9
相关论文
共 22 条
[1]   Impact of Stemming and Word Embedding on Deep Learning-Based Arabic Text Categorization [J].
Almuzaini, Huda Abdulrahman ;
Azmi, Aqil M. .
IEEE ACCESS, 2020, 8 :127913-127928
[2]  
Alrufaye Faiez Musa L., 2021, IOP Conference Series: Materials Science and Engineering, V1045, DOI 10.1088/1757-899X/1045/1/012003
[3]  
Amjad M., 2017, P INT C ACT PROBL SY
[4]   Improving FastText with inverse document frequency of subwords [J].
Choi, Jaekeol ;
Lee, Sang-Woong .
PATTERN RECOGNITION LETTERS, 2020, 133 :165-172
[5]   Arabic text classification using deep learning models [J].
Elnagar, Ashraf ;
Al-Debsi, Ridhwan ;
Einea, Omar .
INFORMATION PROCESSING & MANAGEMENT, 2020, 57 (01)
[6]   Hybrid Feature Selection for Amharic News Document Classification [J].
Endalie, Demeke ;
Haile, Getamesay .
MATHEMATICAL PROBLEMS IN ENGINEERING, 2021, 2021
[7]   Text Classification Based on Convolutional Neural Networks and Word Embedding for Low-Resource Languages: Tigrinya [J].
Fesseha, Awet ;
Xiong, Shengwu ;
Emiru, Eshete Derb ;
Diallo, Moussa ;
Dahou, Abdelghani .
INFORMATION, 2021, 12 (02) :1-17
[8]  
Gasser M., 2011, C HUM LANG TECHN DEV, P1
[9]  
Gebremedhin G.H, 2020, INT J INTELLIGENT SY, V8, P1
[10]  
Grave E., 2018, P INT C LANG RES EV