A DEEP AUTOENCODER-BASED REPRESENTATION FOR ARABIC TEXT CATEGORIZATION

被引:0
|
作者
El-Alami, Fatima-zahra [1 ]
El Mahdaouy, Abdelkader [1 ]
El Alaoui, Said Ouatik [1 ,2 ]
En-Nahnahi, Noureddine [1 ]
机构
[1] Sidi Mohamed Ben Abdellah Univ, Lab Informat & Modeling, FSDM, Fes, Morocco
[2] Ibn Tofail Univ, Natl Sch Appl Sci, Kenitra, Morocco
来源
JOURNAL OF INFORMATION AND COMMUNICATION TECHNOLOGY-MALAYSIA | 2020年 / 19卷 / 03期
关键词
Arabic text representation; deep autoencoder; feature selection; machine learning; text categorization;
D O I
暂无
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Arabic text representation is a challenging assignment for several applications such as text categorization and clustering since the Arabic language is known for its variety, richness and complex morphology. Until recently, the Bag-of-Words remains the most common method for Arabic text representation. However, it suffers from several shortcomings such as semantics deficiency and high dimensionality of feature space. Moreover, most existing methods ignore the explicit knowledge contained in semantic vocabularies such as Arabic WordNet. To overcome these shortcomings, we proposed a deep Autoencoder based representation for Arabic text categorization. It consisted of three stages: (1) Extracting from Arabic WordNet the most relevant concepts based on feature selection processes (2) Features learning via an unsupervised algorithm for text representation (3) Categorizing text using deep Autoencoder. Our method allowed for the consideration of document semantics by combining both implicit and explicit semantics and reducing feature space dimensionality. To evaluate our method, we conducted several experiments on the standard Arabic dataset, OSAC. The obtained results showed the effectiveness of the proposed method compared to state-of-the-art ones.
引用
收藏
页码:381 / 398
页数:18
相关论文
共 50 条
  • [1] Word Sense Representation based-method for Arabic Text Categorization
    El-Alami, Fatima-Zahra
    Ouatik El Alaoui, Said
    9TH INTERNATIONAL SYMPOSIUM ON SIGNAL, IMAGE, VIDEO AND COMMUNICATIONS (ISIVC 2018), 2018, : 141 - 146
  • [2] Arabic text categorization based on arabic wikipedia
    Yahya, A. (yahya@birzeit.edu), 1600, Association for Computing Machinery, 2 Penn Plaza, Suite 701, New York, NY 10121-0701, United States (13):
  • [3] A distance-based classifier for arabic text categorization
    Duwairi, RM
    DMIN '05: Proceedings of the 2005 International Conference on Data Mining, 2005, : 187 - 192
  • [4] A Deep Autoencoder-Based Hybrid Recommender System
    Bougteb, Yahya
    Ouhbi, Brahim
    Frikh, Bouchra
    Zemmouri, Elmoukhtar
    INTERNATIONAL JOURNAL OF MOBILE COMPUTING AND MULTIMEDIA COMMUNICATIONS, 2022, 13 (01)
  • [5] An Overview of Unsupervised Deep Feature Representation for Text Categorization
    Wang, Shiping
    Cai, Jinyu
    Lin, Qihao
    Guo, Wenzhong
    IEEE TRANSACTIONS ON COMPUTATIONAL SOCIAL SYSTEMS, 2019, 6 (03) : 504 - 517
  • [6] Neural Networks for the Automation of Arabic Text Categorization
    AlSaleem, Saleh M.
    2013 INTERNATIONAL CONFERENCE ON COMPUTER APPLICATIONS TECHNOLOGY (ICCAT), 2013,
  • [7] An Autoencoder-Based Deep Learning Classifier for Efficient Diagnosis of Autism
    Sewani, Harshini
    Kashef, Rasha
    CHILDREN-BASEL, 2020, 7 (10):
  • [8] Improving Arabic Text Categorization using Decision Trees
    Harrag, Fouzi
    El-Qawasmeh, Eyas
    Pichappan, Pit
    NDT: 2009 FIRST INTERNATIONAL CONFERENCE ON NETWORKED DIGITAL TECHNOLOGIES, 2009, : 110 - +
  • [9] Autoencoder-based representation learning and its application in intelligent fault diagnosis: A review
    Yang, Zheng
    Xu, Binbin
    Luo, Wei
    Chen, Fei
    MEASUREMENT, 2022, 189
  • [10] Cluster Based Symbolic Representation for Skewed Text Categorization
    Raju, Lavanya Narayana
    Suhil, Mahamad
    Guru, D. S.
    Gowda, Harsha S.
    RECENT TRENDS IN IMAGE PROCESSING AND PATTERN RECOGNITION (RTIP2R 2016), 2017, 709 : 202 - 216