Sentiment analysis using semantic similarity and Hadoop MapReduce

被引:20
作者
Madani, Youness [1 ]
Erritali, Mohammed [1 ]
Bengourram, Jamaa [1 ]
机构
[1] Sultan Moulay Slimane Univ, Dept Comp Sci, Fac Sci & Tech, Beni Mellal, Morocco
关键词
Opinion mining; Sentiment analysis; Semantic similarity; WordNet; Big data; Hadoop;
D O I
10.1007/s10115-018-1212-z
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Sentiment analysis or opinion mining is a domain that analyses people's opinions, sentiments, evaluations, attitudes, and emotions from a written language; it had become a very active area of scientific research in recent years, especially with the development of social networks like Facebook and Twitter. In this paper we propose two new approaches to classify the tweets (look for the feeling expressed in the tweet), the first according to three classes : negative, positive or neutral, and the second according to two classes : negative or positive. Our first method consists in calculating the semantic similarity between the tweet to classify and three documents where each document represents a class (contains the words that represent a class); after the calculation of the similarity, the tweet takes the class of the document that has the greatest value of the semantic similarity with it. And the second method consists in calculating the semantic similarity between each word of the tweet to classify and the words positive and negative by proposing a new formula. We decide to do the analysis in a parallel and distributed way, using the Hadoop framework with the Hadoop distributed file system (HDFS) and the programming model MapReduce to solve the problem of the calculation time of the analysis if the dataset of the tweets is very large. The aim of our work is to combine between several domains, the information retrieval, semantic similarity, opinion mining or sentiment analysis and big data.
引用
收藏
页码:413 / 436
页数:24
相关论文
共 26 条
[1]  
[Anonymous], 6 INT C INN COMP TEC
[2]  
[Anonymous], 2017, INT C COMPUTE DATA A
[3]  
[Anonymous], 1998, ICML
[4]  
[Anonymous], 2014, P 9 INT C LANG RES E
[5]  
[Anonymous], 1998, WORDNET ELECT LEXICA
[6]  
[Anonymous], LECT NOTES COMPUTER
[7]   A hybrid approach to the sentiment analysis problem at the sentence level [J].
Appel, Orestes ;
Chiclana, Francisco ;
Carter, Jenny ;
Fujita, Hamido .
KNOWLEDGE-BASED SYSTEMS, 2016, 108 :110-124
[8]  
Barbosa L., 2010, P 23 INT C COMP LING, P36
[9]   Sentiment Analysis of Twitter Data [J].
El Rahman, Sahar A. ;
AlOtaibi, Feddah Alhumaidi ;
AlShehri, Wejdan Abdullah .
2019 INTERNATIONAL CONFERENCE ON COMPUTER AND INFORMATION SCIENCES (ICCIS), 2019, :336-339
[10]   The Role of Text Pre-processing in Sentiment Analysis [J].
Haddi, Emma ;
Liu, Xiaohui ;
Shi, Yong .
FIRST INTERNATIONAL CONFERENCE ON INFORMATION TECHNOLOGY AND QUANTITATIVE MANAGEMENT, 2013, 17 :26-32