An Ensemble Multi-label Themes-Based Classification for Holy Qur’an Verses Using Word2Vec Embedding

被引：0

作者：

Ensaf Hussein Mohamed

Wessam H. El-Behaidy

机构：

[1] Helwan University,Faculty of Computers and Artificial Intelligence

来源：

Arabian Journal for Science and Engineering | 2021年 / 46卷

关键词：

Multi-label classification; Holy Quran; Arabic NLP; Machine learning; Word2vec; TF-IDF;

D O I：

暂无

中图分类号：

学科分类号：

摘要：

Automatic themes-based classification of Quran verses is the process of classifying verses to predefined categorizes or themes. It is an essential task for all Muslims and people interested in studying the Quran. Quran themes-based classification could be used in many natural language processing (NLP) fields such as search engines, data mining, question–answering systems, and information retrieval applications. This paper presents an ensemble multi-label classification model that automatically identifies and classifies the Quran verses based on themes/topics. The model is composed of four phases: pre-processing, data vectorization, binary relevance classifier, and voting module. Firstly, the verses of the second chapter of the Quran (Al-Baqarah) are tokenized and normalized. Then, the topics of these verses are manually labeled based on “Mushaf Al-Tajweed” classification. Secondly, verses are converted into features’ vectors using term frequency-inverse document frequency (TF-IDF) and word2vec techniques. Word2vec is used to consider the semantic meaning of Quranic words and to improve performance. Also, they are trained on a collected classic Arabic corpus of 200 million words. Then, the binary relevance multi-label classification technique is applied using three different classifiers: logistic regression, support vector machine, and random forest, which categorize verses into 393 topics/tags. Finally, the voting module is applied by picking the tags with the maximum prediction probability among the three classifiers. The results of the three classifiers and the ensemble model are compared against “Mushaf Al-Tajweed.” The ensemble model outperforms the three classifiers. Its average hamming loss, recall, precision, and F1-Score are 0.224, 81%, 75%, and 77%, respectively.

引用

页码：3519 / 3529

页数：10

共 44 条

[41] Multi-label classification of arrhythmia using dynamic graph convolutional network based on encoder-decoder framework
Cheng, Yuhao
Zhu, Wenliang
Li, Deyin
Wang, Lirong
BIOMEDICAL SIGNAL PROCESSING AND CONTROL, 2024, 95
[42] A multi-label emoji classification method using balanced pointwise mutual information-based feature selection
Ahanin, Zahra
Ismail, Maizatul Akmar
COMPUTER SPEECH AND LANGUAGE, 2022, 73
[43] VDIF-M: Multi-label Classification of Vehicle Defect Information Collection Based on Seq2seq Model
You, Xindong
Zhang, Yuwen
Li, Baoan
Lv, Xueqiang
Han, Junmei
MOBILE COMPUTING, APPLICATIONS, AND SERVICES, MOBICASE 2019, 2019, 290 : 96 - 111
[44] K1K2NN: A novel multi-label classification approach based on neighbors for predicting COVID-19 drug side effects
Das, Pranab
Mazumder, Dilwar Hussain
COMPUTATIONAL BIOLOGY AND CHEMISTRY, 2024, 110

← 1 2 3 4 5 →