Distributional Semantic Model Based on Convolutional Neural Network for Arabic Textual Similarity

被引:3
|
作者
Mahmoud, Adnen [1 ]
Zrigui, Mounir [2 ]
机构
[1] Higher Inst Comp Sci & Commun Tech, Monastir, Tunisia
[2] Fac Sci Monastir, Monastir, Tunisia
关键词
Arabic Language; Context Based Approach; Global Vectors Representation; Natural Language Processing; Paraphrase Detection; Semantic Similarity; Word Embedding; Word2vec;
D O I
10.4018/IJCINI.2020010103
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The problem addressed is to develop a model that can reliably identify whether a previously unseen document pair is paraphrased or not. Its detection in Arabic documents is a challenge because of its variability in features and the lack of publicly available corpora. Faced with these problems, the authors propose a semantic approach. At the feature extraction level, the authors use global vectors representation combining global co-occurrence counting and a contextual skip gram model. At the paraphrase identification level, the authors apply a convolutional neural network model to learn more contextual and semantic information between documents. For experiments, the authors use Open Source Arabic Corpora as a source corpus. Then the authors collect different datasets to create a vocabulary model. For the paraphrased corpus construction, the authors replace each word from the source corpus by its most similar one which has the same grammatical class applying the word2vec algorithm and the part-of-speech annotation. Experiments show that the model achieves promising results in terms of precision and recall compared to existing approaches in the literature.
引用
收藏
页码:35 / 50
页数:16
相关论文
共 50 条
  • [41] Combining Attention-based Models with the MeSH Ontology for Semantic Textual Similarity in Clinical Notes
    Faramarzi, Noushin Salek
    Dara, Akanksha
    Banerjee, Ritwik
    2022 IEEE 10TH INTERNATIONAL CONFERENCE ON HEALTHCARE INFORMATICS (ICHI 2022), 2022, : 74 - 83
  • [42] A Convolutional Neural Network Based Sentiment Classification and the Convolutional Kernel Representation
    Gao, Shen
    Zhang, Huaping
    Gao, Kai
    NATURAL LANGUAGE PROCESSING AND INFORMATION SYSTEMS, NLDB 2017, 2017, 10260 : 287 - 291
  • [43] Multi-class Textual Emotion Categorization using Ensemble of Convolutional and Recurrent Neural Network
    Tanzia Parvin
    Omar Sharif
    Mohammed Moshiul Hoque
    SN Computer Science, 2022, 3 (1)
  • [44] Hierarchical Semantic Similarity Metric Model Oriented to Road Network Matching
    Wang Y.
    Yan H.
    Lu X.
    Journal of Geo-Information Science, 2023, 25 (04) : 714 - 725
  • [45] A Proposed Textual Graph Based Model for Arabic Multi-document Summarization
    Alwan, Muneer A.
    Onsi, Hoda M.
    INTERNATIONAL JOURNAL OF ADVANCED COMPUTER SCIENCE AND APPLICATIONS, 2016, 7 (06) : 435 - 439
  • [46] Word Embedding-Based Approaches for Measuring Semantic Similarity of Arabic-English Sentences
    Nagoudi, El Moatez Billah
    Ferrero, Jeremy
    Schwab, Didier
    Cherroun, Hadda
    ARABIC LANGUAGE PROCESSING: FROM THEORY TO PRACTICE, 2018, 782 : 19 - 33
  • [47] Arabic cyberbullying detection system using convolutional neural network and multi-head attention
    Azzeh M.
    Alhijawi B.
    Tabbaza A.
    Alabboshi O.
    Hamdan N.
    Jaser D.
    International Journal of Speech Technology, 2024, 27 (03) : 521 - 537
  • [48] Service classification using adaptive back-propagation neural network and semantic similarity
    Kuang, Li
    Wu, Jian
    Deng, Shuiguang
    Li, Ying
    Wu, Zhaohui
    2006 10TH INTERNATIONAL CONFERENCE ON COMPUTER SUPPORTED COOPERATIVE WORK IN DESIGN, PROCEEDINGS, VOLS 1 AND 2, 2006, : 834 - 838
  • [49] BLSTM-API: Bi-LSTM Recurrent Neural Network-Based Approach for Arabic Paraphrase Identification
    Mahmoud, Adnen
    Zrigui, Mounir
    ARABIAN JOURNAL FOR SCIENCE AND ENGINEERING, 2021, 46 (04) : 4163 - 4174
  • [50] BLSTM-API: Bi-LSTM Recurrent Neural Network-Based Approach for Arabic Paraphrase Identification
    Adnen Mahmoud
    Mounir Zrigui
    Arabian Journal for Science and Engineering, 2021, 46 : 4163 - 4174