Paraphrase identification and semantic text similarity analysis in Arabic news tweets using lexical, syntactic, and semantic features

被引:69
作者
Al-Smadi, Mohammad [1 ]
Jaradat, Zain [1 ]
Al-Ayyoub, Mahmoud [1 ]
Jararweh, Yaser [1 ]
机构
[1] Jordan Univ Sci & Technol, Dept Comp Sci, POB 3030, Irbid 22110, Jordan
关键词
Paraphrase identification; Semantic text similarity; Semantic analysis; Arabic language; Natural language processing;
D O I
10.1016/j.ipm.2017.01.002
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
The rapid growth in digital information has raised considerable challenges in particular when it comes to automated content analysis. Social media such as twitter share a lot of its users' information about their events, opinions, personalities, etc. Paraphrase Identification (PI) is concerned with recognizing whether two texts have the same/similar meaning, whereas the Semantic Text Similarity (STS) is concerned with the degree of that similarity. This research proposes a state-of-the-art approach for paraphrase identification and semantic text similarity analysis in Arabic news tweets. The approach adopts several phases of text processing, features extraction and text classification. Lexical, syntactic, and semantic features are extracted to overcome the weakness and limitations of the current technologies in solving these tasks for the Arabic language. Maximum Entropy (MaxEnt) and Support Vector Regression (SVR) classifiers are trained using these features and are evaluated using a dataset prepared for this research. The experimentation results show that the approach achieves good results in comparison to the baseline results. (c) 2017 Elsevier Ltd. All rights reserved.
引用
收藏
页码:640 / 652
页数:13
相关论文
共 67 条
  • [1] Agirre Eneko, 2014, SEMEVAL COLING, P81
  • [2] Althobaiti M., 2014, Aranlp: A java-based library for the processing of arabic text
  • [3] [Anonymous], 1999, FDN STAT NATURAL LAN
  • [4] [Anonymous], 2015, P 2015 SIAM INT C DA
  • [5] [Anonymous], P LREC 2010 WORKSH N
  • [6] [Anonymous], 2015, SEMEVAL
  • [7] [Anonymous], 1998, Combining local context and wordnet similarity for word sense identification
  • [8] [Anonymous], 2015, P 9 INT WORKSHOP SEM
  • [9] [Anonymous], P 9 INT WORKSH SEM E
  • [10] [Anonymous], 2012, Proceedings of the First Joint Conference on Lexical and Computational Semantics