A DEEP AUTOENCODER-BASED REPRESENTATION FOR ARABIC TEXT CATEGORIZATION

被引:0
|
作者
El-Alami, Fatima-zahra [1 ]
El Mahdaouy, Abdelkader [1 ]
El Alaoui, Said Ouatik [1 ,2 ]
En-Nahnahi, Noureddine [1 ]
机构
[1] Sidi Mohamed Ben Abdellah Univ, Lab Informat & Modeling, FSDM, Fes, Morocco
[2] Ibn Tofail Univ, Natl Sch Appl Sci, Kenitra, Morocco
关键词
Arabic text representation; deep autoencoder; feature selection; machine learning; text categorization;
D O I
暂无
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Arabic text representation is a challenging assignment for several applications such as text categorization and clustering since the Arabic language is known for its variety, richness and complex morphology. Until recently, the Bag-of-Words remains the most common method for Arabic text representation. However, it suffers from several shortcomings such as semantics deficiency and high dimensionality of feature space. Moreover, most existing methods ignore the explicit knowledge contained in semantic vocabularies such as Arabic WordNet. To overcome these shortcomings, we proposed a deep Autoencoder based representation for Arabic text categorization. It consisted of three stages: (1) Extracting from Arabic WordNet the most relevant concepts based on feature selection processes (2) Features learning via an unsupervised algorithm for text representation (3) Categorizing text using deep Autoencoder. Our method allowed for the consideration of document semantics by combining both implicit and explicit semantics and reducing feature space dimensionality. To evaluate our method, we conducted several experiments on the standard Arabic dataset, OSAC. The obtained results showed the effectiveness of the proposed method compared to state-of-the-art ones.
引用
收藏
页码:381 / 398
页数:18
相关论文
共 50 条
  • [1] Word Sense Representation based-method for Arabic Text Categorization
    El-Alami, Fatima-Zahra
    Ouatik El Alaoui, Said
    9TH INTERNATIONAL SYMPOSIUM ON SIGNAL, IMAGE, VIDEO AND COMMUNICATIONS (ISIVC 2018), 2018, : 141 - 146
  • [2] Arabic text categorization based on arabic wikipedia
    Yahya, A. (yahya@birzeit.edu), 1600, Association for Computing Machinery, 2 Penn Plaza, Suite 701, New York, NY 10121-0701, United States (13):
  • [3] A Deep Autoencoder-Based Knowledge Transfer Approach
    Tirumala, Sreenivas Sremath
    PROCEEDINGS OF INTERNATIONAL CONFERENCE ON COMPUTATIONAL INTELLIGENCE AND DATA ENGINEERING, 2018, 9 : 277 - 284
  • [4] A Superior Arabic Text Categorization Deep Model (SATCDM)
    Alhawarat, M.
    Aseeri, Ahmad O.
    IEEE ACCESS, 2020, 8 : 24653 - 24661
  • [5] Deep Neural Models and Retrofitting for Arabic Text Categorization
    El-Alami, Fatima-Zahra
    El Alaoui, Said Ouatik
    En-Nahnahi, Noureddine
    INTERNATIONAL JOURNAL OF INTELLIGENT INFORMATION TECHNOLOGIES, 2020, 16 (02) : 74 - 86
  • [6] A Deep Autoencoder-Based Hybrid Recommender System
    Bougteb, Yahya
    Ouhbi, Brahim
    Frikh, Bouchra
    Zemmouri, Elmoukhtar
    INTERNATIONAL JOURNAL OF MOBILE COMPUTING AND MULTIMEDIA COMMUNICATIONS, 2022, 13 (01)
  • [7] Randomized Autoencoder-based Representation for Dynamic Texture Recognition
    Fares, Ricardo T.
    Ribas, Lucas C.
    2024 31ST INTERNATIONAL CONFERENCE ON SYSTEMS, SIGNALS AND IMAGE PROCESSING, IWSSIP 2024, 2024,
  • [8] Arabic Text Categorization: a Comparative Study of Different Representation Modes
    Elberrichi, Zakaria
    Abidi, Karima
    INTERNATIONAL ARAB JOURNAL OF INFORMATION TECHNOLOGY, 2012, 9 (05) : 465 - 470
  • [9] Arabic text categorization: A comparative study of different representation modes
    Karima, A. (k_abidi@esi.dz), 1600, Asian Research Publishing Network (ARPN) (38):
  • [10] Deep Autoencoder-based Z-Interference Channels
    Zhang, Xinliang
    Vaezi, Mojtaba
    2023 IEEE WIRELESS COMMUNICATIONS AND NETWORKING CONFERENCE, WCNC, 2023,