Distributional Semantic Model Based on Convolutional Neural Network for Arabic Textual Similarity

被引:3
|
作者
Mahmoud, Adnen [1 ]
Zrigui, Mounir [2 ]
机构
[1] Higher Inst Comp Sci & Commun Tech, Monastir, Tunisia
[2] Fac Sci Monastir, Monastir, Tunisia
关键词
Arabic Language; Context Based Approach; Global Vectors Representation; Natural Language Processing; Paraphrase Detection; Semantic Similarity; Word Embedding; Word2vec;
D O I
10.4018/IJCINI.2020010103
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The problem addressed is to develop a model that can reliably identify whether a previously unseen document pair is paraphrased or not. Its detection in Arabic documents is a challenge because of its variability in features and the lack of publicly available corpora. Faced with these problems, the authors propose a semantic approach. At the feature extraction level, the authors use global vectors representation combining global co-occurrence counting and a contextual skip gram model. At the paraphrase identification level, the authors apply a convolutional neural network model to learn more contextual and semantic information between documents. For experiments, the authors use Open Source Arabic Corpora as a source corpus. Then the authors collect different datasets to create a vocabulary model. For the paraphrased corpus construction, the authors replace each word from the source corpus by its most similar one which has the same grammatical class applying the word2vec algorithm and the part-of-speech annotation. Experiments show that the model achieves promising results in terms of precision and recall compared to existing approaches in the literature.
引用
收藏
页码:35 / 50
页数:16
相关论文
共 50 条
  • [31] Semantic Textual Similarity on Brazilian Portuguese: An approach based on language-mixture models
    Silva, A.
    Lozkins, A.
    Bertoldi, L. R.
    Rigo, S.
    Bure, V. M.
    VESTNIK SANKT-PETERBURGSKOGO UNIVERSITETA SERIYA 10 PRIKLADNAYA MATEMATIKA INFORMATIKA PROTSESSY UPRAVLENIYA, 2019, 15 (02): : 235 - 244
  • [32] Building the Semantic Similarity Model for Social Network Data Streams
    Petrasova, Svitlana
    Khairova, Nina
    Lewoniewski, Wlodzimierz
    2018 IEEE SECOND INTERNATIONAL CONFERENCE ON DATA STREAM MINING & PROCESSING (DSMP), 2018, : 21 - 24
  • [33] Prioritizing CircRNA-Disease Associations With Convolutional Neural Network Based on Multiple Similarity Feature Fusion
    Fan, Chunyan
    Lei, Xiujuan
    Pan, Yi
    FRONTIERS IN GENETICS, 2020, 11
  • [34] A novel model for semantic similarity measurement based on wordnet and word embedding
    Zhao, Fuqiang
    Zhu, Zhengyu
    Han, Ping
    JOURNAL OF INTELLIGENT & FUZZY SYSTEMS, 2021, 40 (05) : 9831 - 9842
  • [35] Measurement of Semantic Textual Similarity in Clinical Texts: Comparison of Transformer-Based Models
    Yang, Xi
    He, Xing
    Zhang, Hansi
    Ma, Yinghan
    Bian, Jiang
    Wu, Yonghui
    JMIR MEDICAL INFORMATICS, 2020, 8 (11)
  • [36] Exploring Semantic Similarity Measure Based on Word Embedding Representation for Arabic Passages Retrieval
    Lahbari, Imane
    El Alaoui, Said Ouatik
    ADVANCED INTELLIGENT SYSTEMS FOR SUSTAINABLE DEVELOPMENT (AI2SD'2020), VOL 2, 2022, 1418 : 978 - 989
  • [37] A novel locality-sensitive hashing relational graph matching network for semantic textual similarity measurement
    Li, Haozhe
    Wang, Wenhai
    Liu, Zhaoran
    Niu, Yunlong
    Wang, Hao
    Zhao, Shunping
    Liao, Yilin
    Yang, Weigeng
    Liu, Xinggao
    EXPERT SYSTEMS WITH APPLICATIONS, 2022, 207
  • [38] An Ontology-Based Semantic Similarity Computation Model
    Yang, Yuehua
    Ping, Yuan
    2018 IEEE INTERNATIONAL CONFERENCE ON BIG DATA AND SMART COMPUTING (BIGCOMP), 2018, : 561 - 564
  • [39] Summary Model based on Semantic Similarity Attention Focus
    Qin, Fei
    Wei, Lixing
    Wang, Pin
    2022 6TH INTERNATIONAL SYMPOSIUM ON COMPUTER SCIENCE AND INTELLIGENT CONTROL, ISCSIC, 2022, : 21 - 24
  • [40] Semantic Capture Analysis in Word Embedding Vectors Using Convolutional Neural Network
    Navarro-Almanza, Raul
    Licea, Guillermo
    Juarez-Ramirez, Reyes
    Mendoza, Olivia
    RECENT ADVANCES IN INFORMATION SYSTEMS AND TECHNOLOGIES, VOL 1, 2017, 569 : 106 - 114