An Optimized Arabic Multilabel Text Classification Approach Using Genetic Algorithm and Ensemble Learning

被引:3
|
作者
Alzanin, Samah M. [1 ]
Gumaei, Abdu [1 ]
Haque, Md Azimul [2 ]
Muaad, Abdullah Y. [3 ]
机构
[1] Prince Sattam Bin Abdulaziz Univ, Coll Comp Engn & Sci, Dept Comp Sci, Al Kharj 11942, Saudi Arabia
[2] Utkal Univ, Dept Commerce, Bhubaneswar 751004, India
[3] Univ Mysore, Dept Studies Comp Sci, Mysore 570006, India
来源
APPLIED SCIENCES-BASEL | 2023年 / 13卷 / 18期
关键词
Arabic language; genetic algorithm; ensemble learning; multi-label; text classification; FEATURE-SELECTION; CATEGORIZATION; SPEECH;
D O I
10.3390/app131810264
中图分类号
O6 [化学];
学科分类号
0703 ;
摘要
Multilabel classification of Arabic text is an important task for understanding and analyzing social media content. It can enable the categorization and monitoring of social media posts, the detection of important events, the identification of trending topics, and the gaining of insights into public opinion and sentiment. However, multilabel classification of Arabic contents can present a certain challenge due to the high dimensionality of the representation and the unique characteristics of the Arabic language. In this paper, an effective approach is proposed for Arabic multilabel classification using a metaheuristic Genetic Algorithm (GA) and ensemble learning. The approach explores the effect of Arabic text representation on classification performance using both Bag of Words (BOW) and Term Frequency-Inverse Document Frequency (TF-IDF) methods. Moreover, it compares the performance of ensemble learning methods such as the Extra Trees Classifier (ETC) and Random Forest Classifier (RFC) against a Logistic Regression Classifier (LRC) as a single and ensemble classifier. We evaluate the approach on a new public dataset, namely, the MAWQIF dataset. The MAWQIF is the first multilabel Arabic dataset for target-specific stance detection. The experimental results demonstrate that the proposed approach outperforms the related work on the same dataset, achieving 80.88% for sentiment classification and 68.76% for multilabel tasks in terms of the F1-score metric. In addition, the data augmentation with feature selection improves the F1-score result of the ETC from 65.62% to 68.80%. The study shows the ability of the GA-based feature selection with ensemble learning to improve the classification of multilabel Arabic text.
引用
收藏
页数:21
相关论文
共 50 条
  • [1] Genetic Algorithm and Ensemble Learning Aided Text Classification using Support Vector Machines
    Chauhan, Anshumaan
    Agarwal, Ayushi
    Sulthana, Razia
    INTERNATIONAL JOURNAL OF ADVANCED COMPUTER SCIENCE AND APPLICATIONS, 2021, 12 (08) : 260 - 267
  • [2] Text Authorship Identification Based On Ensemble Learning and Genetic Algorithm Combination in Turkish Text
    Gullu, Merve
    Polat, Huseyin
    JOURNAL OF POLYTECHNIC-POLITEKNIK DERGISI, 2022, 25 (03): : 1287 - 1297
  • [3] Text classification using genetic algorithm oriented latent semantic features
    Uysal, Alper Kursat
    Gunal, Serkan
    EXPERT SYSTEMS WITH APPLICATIONS, 2014, 41 (13) : 5938 - 5947
  • [4] Arabic Text Classification Using Deep Learning Technics
    Boukil, Samir
    Biniz, Mohamed
    El Adnani, Fatiha
    Cherrat, Loubna
    El Moutaouakkil, Abd Elmaj Id
    INTERNATIONAL JOURNAL OF GRID AND DISTRIBUTED COMPUTING, 2018, 11 (09): : 103 - 114
  • [5] Text length considered adaptive bagging ensemble learning algorithm for text classification
    Youwei Wang
    Jiangchun Liu
    Lizhou Feng
    Multimedia Tools and Applications, 2023, 82 : 27681 - 27706
  • [6] Text length considered adaptive bagging ensemble learning algorithm for text classification
    Wang, Youwei
    Liu, Jiangchun
    Feng, Lizhou
    MULTIMEDIA TOOLS AND APPLICATIONS, 2023, 82 (18) : 27681 - 27706
  • [7] DCGAEL: An Optimized Ensemble Learning using a Discrete-Continuous Bi-Level Genetic Algorithm
    Adibi, Mohammad Amin
    JOURNAL OF INFORMATION SCIENCE AND ENGINEERING, 2022, 38 (04) : 761 - 774
  • [8] Sentiment Analysis for Arabic Text Using Ensemble Learning
    Al-Saqqa, Samar
    Obeid, Nadim
    Awajan, Arafat
    2018 IEEE/ACS 15TH INTERNATIONAL CONFERENCE ON COMPUTER SYSTEMS AND APPLICATIONS (AICCSA), 2018,
  • [9] Arabic Text Classification: A Comparative Approach Using a Big Dataset
    Madhfar, Mokhtar Ali Hasan
    Al-Hagery, Mohammed Abdullah Hassan
    2019 INTERNATIONAL CONFERENCE ON COMPUTER AND INFORMATION SCIENCES (ICCIS), 2019, : 465 - 469
  • [10] An Optimized Framework for Cancer Classification Using Deep Learning and Genetic Algorithm
    Sharma, Aman
    Rani, Rinkle
    JOURNAL OF MEDICAL IMAGING AND HEALTH INFORMATICS, 2017, 7 (08) : 1851 - 1856