Leveraging Knowledge-Based Features With Multilevel Attention Mechanisms for Short Arabic Text Classification

被引:6
作者
Alagha, Iyad [1 ]
机构
[1] Islamic Univ Gaza, Fac Informat Technol, Gaza 00972, Palestine
关键词
Internet; Encyclopedias; Online services; Text categorization; Feature extraction; Deep learning; Semantics; Short text classification; Arabic; deep learning; attention mechanism; Wikipedia; INFORMATION-CONTENT; ENTITY LINKING; WIKIPEDIA; ONTOLOGY;
D O I
10.1109/ACCESS.2022.3175306
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
With the wide spread of short texts through social media platforms, there has become a growing need for effective methods for short-text classification. However, short-text classification has always been challenging due to the ambiguity and the data sparsity of the short text. A common solution is to enrich the short text with additional semantic features extracted from external knowledge, such as Wikipedia, to help the classifier better decide on the correct class. Most existing works, however, focused on text written in English and benefited from the existence of entity-linking tools based on English-based knowledge bases. When it comes to the Arabic language, the exploitation of external knowledge to support the classification of Arabic short text has not been widely explored. This work presents an approach for the classification of short Arabic text that exploits both the Wikipedia-based features and the attention mechanism for effective classification. First, Wikipedia entities mentioned in the short text are identified. Then, Wikipedia categories associated with the identified entities are retrieved and filtered to retain only the most relevant categories. A deep learning model with multiple attention mechanisms is then used to encode the short text and the associated category set. Finally, the short text and category representations are combined together to be fed into the classification layer. The use of the attentive model with category filtering leads to highlighting the most important features while reducing the effect of improper features. Finally, the proposed model is evaluated by comparing it with several deep learning models.
引用
收藏
页码:51908 / 51921
页数:14
相关论文
共 85 条
[1]  
Abdelali A., 2021, ARXIV210210684
[2]  
Abdelali Ahmed.., 2016, P C N AM CHAPTER ASS, P11, DOI [DOI 10.18653/V1/N16, 10.18653/v1/N16-3003]
[3]  
Abdul-Mageed M., 2020, ARXIV210101785
[4]  
Abdulghani F. A., 2022, Iraqi J Sci, V63, P409, DOI [DOI 10.24996/IJS.2022.63.1.37, 10.24996/ijs.2022.63.1.37]
[5]   A comparative study of effective approaches for Arabic sentiment analysis [J].
Abu Farha, Ibrahim ;
Magdy, Walid .
INFORMATION PROCESSING & MANAGEMENT, 2021, 58 (02)
[6]  
Al-Saqqa S, 2019, 2019 INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE, ROBOTICS AND CONTROL, AIRC 2019, P39, DOI 10.1145/3388218.3388229
[7]   Arabic Fake News Detection: Comparative Study of Neural Networks and Transformer-Based Approaches [J].
Al-Yahya, Maha ;
Al-Khalifa, Hend ;
Al-Baity, Heyam ;
AlSaeed, Duaa ;
Essam, Amr .
COMPLEXITY, 2021, 2021
[8]   TAG RECOMMENDATION FOR SHORT ARABIC TEXT BY USING LATENT SEMANTIC ANALYSIS OF WIKIPEDIA [J].
AlAgha, Iyad ;
Abu-Samra, Yousef .
JORDANIAN JOURNAL OF COMPUTERS AND INFORMATION TECHNOLOGY, 2020, 6 (02) :165-181
[9]  
Alam Mehwish, 2020, Knowledge Engineering and Knowledge Management. 22nd International Conference, EKAW 2020. Proceedings. Lecture Notes in Artificial Intelligence Subseries of Lecture Notes in Computer Science (LNAI 12387), P136, DOI 10.1007/978-3-030-61244-3_9
[10]  
Alayba AM, 2017, 2017 1ST INTERNATIONAL WORKSHOP ON ARABIC SCRIPT ANALYSIS AND RECOGNITION (ASAR), P114, DOI 10.1109/ASAR.2017.8067771