Comprehension of polarity of articles by citation sentiment analysis using TF-IDF and ML classifiers

被引:0
作者
Karim M. [1 ]
Missen M.M.S. [1 ]
Umer M. [1 ]
Fida A. [1 ]
Eshmawi A.A. [2 ]
Mohamed A. [3 ]
Ashraf I. [4 ]
机构
[1] Department of Computer Science & Information Technology, Islamia University, Bahawalpur
[2] University of Jeddah, Department of Cybersecurity, College of Computer Science and Engineering, Jeddah
[3] University Research Centre, Future University, Cairo
[4] Information and Communication Engineering, Yeungnam University, Gyeongsan
关键词
Citation sentiment analysis; Dataset balancing; Machine learning; SMOTE; Term frequency-inverse document frequency;
D O I
10.7717/PEERJ-CS.1107
中图分类号
学科分类号
摘要
Sentiment analysis has been researched extensively during the last few years, however, the sentiment analysis of citations in a research article is an unexplored research area. Sentiment analysis of citations can provide new applications in bibliometrics and provide insights for a better understanding of scientific knowledge. Citation count, as it is used today to measure the quality of a paper, does not portray the quality of a scientific article, as the article may be cited to indicate its weakness. So determining the polarity of a citation is an important task to quantify the quality of the cited article and ascertain its impact and ranking. This article presents an approach to determine the polarity of the cited article using term frequency-inverse document frequency and machine learning classifiers. To analyze the influence of an imbalanced dataset, several experiments are performed with and without the synthetic minority oversampling technique (SMOTE) and uni-gram and bi-gram term frequency-inverse document frequency (TF-IDF). Results indicate that the proposed methodology achieves high accuracy of 99.0% with the extra tree classifier when trained on SMOTE oversampled dataset and bi-gram features. © Karim 2022 et al.
引用
收藏
相关论文
共 42 条
  • [11] Deng X-B, Ye Y-M, Li H-B, Huang JZ., An improved random forest approach for detection of hidden web search interfaces, 2008 International Conference on Machine Learning and Cybernetics, 3, pp. 1586-1591, (2008)
  • [12] Freund Y, Schapire R, Abe N., A short introduction to boosting, Journal-Japanese Society for Artificial Intelligence, 14, pp. 771-780, (1999)
  • [13] Friedman JH., Greedy function approximation: a gradient boosting machine, The Annals ofStatistics, 29, 5, pp. 1189-1232, (2001)
  • [14] Gardner WA., Learning characteristics of stochastic-gradient-descent algorithms: a general study, analysis, and critique, Signal Processing, 6, 2, pp. 113-133, (1984)
  • [15] Ghosh S, Shah C., Identifying citation sentiment and its influence while indexing scientific papers, Proceedings ofthe 53rd Hawaii international conference on system sciences, (2020)
  • [16] Hernandez M, Gomez JM., Survey in sentiment, polarity and function analysis of citation, Proceedings ofthe first workshop on argumentation mining, pp. 102-103, (2014)
  • [17] Hirsch JE., An index to quantify an individual's scientific research output, Proceedings ofthe National Academy ofSciences ofthe United States ofAmerica, 102, 46, pp. 16569-16572, (2005)
  • [18] Ikram MT, Afzal MT., Aspect based citation sentiment analysis using linguistic patterns for better comprehension of scientific knowledge, Scientometrics, 119, 1, pp. 73-95, (2019)
  • [19] Jochim C, Schutze H., Improving citation polarity classification with product reviews, Proceedings ofthe 52nd annual meeting ofthe association for computational linguistics (Volume 2: Short Papers), pp. 42-48, (2014)
  • [20] Johnson R, Zhang T., Learning nonlinear functions using regularized greedy forest, IEEE Transactions on Pattern Analysis and Machine Intelligence, 36, 5, pp. 942-954, (2013)