Machine learning-based multi-documents sentiment-oriented summarization using linguistic treatment

被引:48
作者
Abdi, Asad [1 ]
Shamsuddin, Siti Mariyam [1 ]
Hasan, Shafaatunnur [1 ]
Piran, Md Jalil [2 ]
机构
[1] Univ Teknol Malaysia, UTM Big Data Ctr BDC, Skudai 81310, Johor, Malaysia
[2] Sejong Univ, Dept Comp Sci & Engn, Seoul, South Korea
关键词
Sentiment analysis; Sentiment summarization; Machine Learning; Sentiment knowledge; Word embedding; AGREEMENT; SELECTION; ENSEMBLE; REVIEWS; MODEL;
D O I
10.1016/j.eswa.2018.05.010
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Sentiment summarization is the process of automatically creating a compressed version of the opinionated information expressed in a text. This paper presents a machine learning-based approach to summarize user's opinion expressed in reviews using: (1) Sentiment knowledge to calculate a sentence sentiment score as one of the features for sentence-level classification. It integrates multiple strategies to tackle the following problems: sentiment shifter, the types of sentences and word coverage limit. (2) Word embedding model, a deep-learning-inspired method to understand meaning and semantic relationships among words and to extract a vector representation for each word. (3) Statistical and linguistic knowledge to determine salient sentences. The proposed method combines several types of features into a unified feature set to design a more accurate classification system ("True": the extractive reference summary; "False": otherwise). Thus, to achieve better performance scores, we carried out a performance study of four well-known feature selection techniques and seven of the most famous classifiers to select the most relevant set of features and find an efficient machine learning classifier, respectively. The proposed method is applied to three different datasets and the results show the integration of support vector machine-based classification method and Information Gain (IG) as a feature selection technique can significantly improve the performance and make the method comparable to other existing methods. Furthermore, our method that learns from this unified feature set can obtain better performance than one that learns from a feature subset. (C) 2018 Elsevier Ltd. All rights reserved.
引用
收藏
页码:66 / 85
页数:20
相关论文
共 101 条
[1]  
Abdi A., 2015, SOFT COMPUT, P1
[2]   QMOS: Query-based multi-documents opinion-oriented summarization [J].
Abdi, Asad ;
Shamsuddin, Siti Mariyam ;
Aliguliyev, Ramiz M. .
INFORMATION PROCESSING & MANAGEMENT, 2018, 54 (02) :318-338
[3]   An Automated Summarization Assessment Algorithm for Identifying Summarizing Strategies [J].
Abdi, Asad ;
Idris, Norisma ;
Alguliyev, Rasim M. ;
Aliguliyev, Ramiz M. .
PLOS ONE, 2016, 11 (01)
[4]   PDLK: Plagiarism detection using linguistic knowledge [J].
Abdi, Asad ;
Idris, Norisma ;
Alguliyev, Rasim M. ;
Aliguliyev, Ramiz M. .
EXPERT SYSTEMS WITH APPLICATIONS, 2015, 42 (22) :8936-8946
[5]   Automatic summarization assessment through a combination of semantic and syntactic information for intelligent educational systems [J].
Abdi, Asad ;
Idris, Norisma ;
Alguliev, Rasim M. ;
Aliguliyev, Ramiz M. .
INFORMATION PROCESSING & MANAGEMENT, 2015, 51 (04) :340-358
[6]  
Abdi S. A., 2014, INT J ENHANCED RES S, V3, P466
[7]  
Alguliyev R. M., 2015, APPL SOFT COMPUTING
[8]   An unsupervised approach to generating generic summaries of documents [J].
Alguliyev, Rasim M. ;
Aliguliyev, Ramiz M. ;
Isazade, Nijat R. .
APPLIED SOFT COMPUTING, 2015, 34 :236-250
[9]  
Alonso L., 2005, THESIS
[10]  
Amplayo R. K., 2017, DATA KNOWLEDGE ENG