Distributional Semantic Model Based on Convolutional Neural Network for Arabic Textual Similarity

被引:3
|
作者
Mahmoud, Adnen [1 ]
Zrigui, Mounir [2 ]
机构
[1] Higher Inst Comp Sci & Commun Tech, Monastir, Tunisia
[2] Fac Sci Monastir, Monastir, Tunisia
关键词
Arabic Language; Context Based Approach; Global Vectors Representation; Natural Language Processing; Paraphrase Detection; Semantic Similarity; Word Embedding; Word2vec;
D O I
10.4018/IJCINI.2020010103
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The problem addressed is to develop a model that can reliably identify whether a previously unseen document pair is paraphrased or not. Its detection in Arabic documents is a challenge because of its variability in features and the lack of publicly available corpora. Faced with these problems, the authors propose a semantic approach. At the feature extraction level, the authors use global vectors representation combining global co-occurrence counting and a contextual skip gram model. At the paraphrase identification level, the authors apply a convolutional neural network model to learn more contextual and semantic information between documents. For experiments, the authors use Open Source Arabic Corpora as a source corpus. Then the authors collect different datasets to create a vocabulary model. For the paraphrased corpus construction, the authors replace each word from the source corpus by its most similar one which has the same grammatical class applying the word2vec algorithm and the part-of-speech annotation. Experiments show that the model achieves promising results in terms of precision and recall compared to existing approaches in the literature.
引用
收藏
页码:35 / 50
页数:16
相关论文
共 50 条
  • [21] Aspect-Based Semantic Textual Similarity for Educational Test Items
    Do, Heejin
    Lee, Gary Geunbae
    ARTIFICIAL INTELLIGENCE IN EDUCATION, PT II, AIED 2024, 2024, 14830 : 344 - 352
  • [22] Using Sentence Semantic Similarity Based on WordNet in Recognizing Textual Entailment
    Castillo, Julio J.
    Cardenas, Marina E.
    ADVANCES IN ARTIFICIAL INTELLIGENCE - IBERAMIA 2010, 2010, 6433 : 366 - 375
  • [23] Sentence Modeling via Graph Construction and Graph Neural Networks for Semantic Textual Similarity
    Zhou, Ke
    Xu, Ke
    Sun, Tanfeng
    Zhang, Yueguo
    2020 13TH INTERNATIONAL CONGRESS ON IMAGE AND SIGNAL PROCESSING, BIOMEDICAL ENGINEERING AND INFORMATICS (CISP-BMEI 2020), 2020, : 413 - 418
  • [24] Incorporating Domain Knowledge Into Language Models by Using Graph Convolutional Networks for Assessing Semantic Textual Similarity: Model Development and Performance Comparison
    Chang, David
    Lin, Eric
    Brandt, Cynthia
    Taylor, Richard Andrew
    JMIR MEDICAL INFORMATICS, 2021, 9 (11)
  • [25] Arabic Text Classification Using Convolutional Neural Network and Genetic Algorithms
    Alsaleh, Deem
    Larabi-Marie-Sainte, Souad
    IEEE ACCESS, 2021, 9 (09): : 91670 - 91685
  • [26] Recognition of Objects for the Description of Images In Arabic language with Convolutional Neural Network
    Farhani, Nada
    Terbeh, Naim
    Zrigui, Mounir
    EDUCATION EXCELLENCE AND INNOVATION MANAGEMENT THROUGH VISION 2020, 2019, : 8947 - 8954
  • [27] SupMPN: Supervised Multiple Positives and Negatives Contrastive Learning Model for Semantic Textual Similarity
    Dehghan, Somaiyeh
    Amasyali, Mehmet Fatih
    APPLIED SCIENCES-BASEL, 2022, 12 (19):
  • [28] Measuring Semantic Similarity Between Sentences Using a Siamese Neural Network
    Ichida, Alexandre Yukio
    Meneguzzi, Felipe
    Ruiz, Duncan D.
    2018 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2018,
  • [29] Integrated Semantic Similarity Model Based on Ontology
    LIU Ya-Jun
    Wuhan University Journal of Natural Sciences, 2004, (05) : 601 - 605
  • [30] A Nonlinear Model to Rank Association Rules Based on Semantic Similarity and Genetic Network Programing
    Yang, Guangfei
    Shimada, Kaoru
    Mabu, Shingo
    Hirasawa, Kotaro
    IEEJ TRANSACTIONS ON ELECTRICAL AND ELECTRONIC ENGINEERING, 2009, 4 (02) : 248 - 256