An Ensemble Multi-label Themes-Based Classification for Holy Qur’an Verses Using Word2Vec Embedding

被引:0
作者
Ensaf Hussein Mohamed
Wessam H. El-Behaidy
机构
[1] Helwan University,Faculty of Computers and Artificial Intelligence
来源
Arabian Journal for Science and Engineering | 2021年 / 46卷
关键词
Multi-label classification; Holy Quran; Arabic NLP; Machine learning; Word2vec; TF-IDF;
D O I
暂无
中图分类号
学科分类号
摘要
Automatic themes-based classification of Quran verses is the process of classifying verses to predefined categorizes or themes. It is an essential task for all Muslims and people interested in studying the Quran. Quran themes-based classification could be used in many natural language processing (NLP) fields such as search engines, data mining, question–answering systems, and information retrieval applications. This paper presents an ensemble multi-label classification model that automatically identifies and classifies the Quran verses based on themes/topics. The model is composed of four phases: pre-processing, data vectorization, binary relevance classifier, and voting module. Firstly, the verses of the second chapter of the Quran (Al-Baqarah) are tokenized and normalized. Then, the topics of these verses are manually labeled based on “Mushaf Al-Tajweed” classification. Secondly, verses are converted into features’ vectors using term frequency-inverse document frequency (TF-IDF) and word2vec techniques. Word2vec is used to consider the semantic meaning of Quranic words and to improve performance. Also, they are trained on a collected classic Arabic corpus of 200 million words. Then, the binary relevance multi-label classification technique is applied using three different classifiers: logistic regression, support vector machine, and random forest, which categorize verses into 393 topics/tags. Finally, the voting module is applied by picking the tags with the maximum prediction probability among the three classifiers. The results of the three classifiers and the ensemble model are compared against “Mushaf Al-Tajweed.” The ensemble model outperforms the three classifiers. Its average hamming loss, recall, precision, and F1-Score are 0.224, 81%, 75%, and 77%, respectively.
引用
收藏
页码:3519 / 3529
页数:10
相关论文
共 44 条
  • [31] Linear Ordering Problem based Classifier Chain using Genetic Algorithm for multi-label classification
    Mishra, Nitin Kumar
    Singh, Pramod Kumar
    APPLIED SOFT COMPUTING, 2022, 117
  • [32] Multi-Label Genre Classification of Web Pages Using an Adaptive Centroid-Based Classifier
    Jebari, Chaker
    JOURNAL OF INFORMATION & KNOWLEDGE MANAGEMENT, 2016, 15 (01)
  • [33] Skills prediction based on multi-label resume classification using CNN with model predictions explanation
    Kameni Florentin Flambeau Jiechieu
    Norbert Tsopze
    Neural Computing and Applications, 2021, 33 : 5069 - 5087
  • [34] Skills prediction based on multi-label resume classification using CNN with model predictions explanation
    Jiechieu, Kameni Florentin Flambeau
    Tsopze, Norbert
    NEURAL COMPUTING & APPLICATIONS, 2021, 33 (10) : 5069 - 5087
  • [35] A Hierarchical Fine-Tuning Approach Based on Joint Embedding of Words and Parent Categories for Hierarchical Multi-label Text Classification
    Ma, Yinglong
    Zhao, Jingpeng
    Jin, Beihong
    ARTIFICIAL NEURAL NETWORKS AND MACHINE LEARNING, ICANN 2020, PT II, 2020, 12397 : 746 - 757
  • [36] Semi-supervised multi-label classification using an extended graph-based manifold regularization
    Ding Li
    Scott Dick
    Complex & Intelligent Systems, 2022, 8 : 1561 - 1577
  • [37] Multiple Ocular Disease Diagnosis Using Fundus Images Based on Multi-Label Deep Learning Classification
    Ouda, Osama
    AbdelMaksoud, Eman
    Abd El-Aziz, A. A.
    Elmogy, Mohammed
    ELECTRONICS, 2022, 11 (13)
  • [38] Semi-supervised multi-label classification using an extended graph-based manifold regularization
    Li, Ding
    Dick, Scott
    COMPLEX & INTELLIGENT SYSTEMS, 2022, 8 (02) : 1561 - 1577
  • [39] Multi-label Fuzzy Similarity-Based Nearest-Neighbour Classification Using Association Rule
    Rong, Yu
    Qu, Yanpeng
    Deng, Ansheng
    INTELLIGENT DATA ENGINEERING AND AUTOMATED LEARNING - IDEAL 2016, 2016, 9937 : 542 - 551
  • [40] Customer emotion detection and analytics in hotel and tourism services using multi-label classificational models based on ensemble learning
    Nguyen, Van-Ho
    Nguyen, Nghia
    Nguyen, Thuy-Hien
    Nguyen, Yen-Nhi
    Dinh, Mai-Thu
    Doan, Dung
    ANNALS OF OPERATIONS RESEARCH, 2025,