Predicting Supervise Machine Learning Performances for Sentiment Analysis Using Contextual-Based Approaches

被引:30
作者
Aziz, Azwa Abdul [1 ,2 ]
Starkey, Andrew [1 ]
机构
[1] Univ Aberdeen, Sch Engn, Aberdeen AB24 3FX, Scotland
[2] Univ Sultan Zainal Abidin UniSZA, Fac Informat & Comp, Tembila Campus, Kuala Terengganu 22200, Malaysia
关键词
Text analytics; sentiment analysis; contextual analysis; supervised machine learning; CLASSIFICATION;
D O I
10.1109/ACCESS.2019.2958702
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Sentiment Analysis (SA) is focused on mining opinion (identification and classification) from unstructured text data such as product reviews or microblogs. It is widely used for brand reviews, political campaigns, marketing analysis or gaining feedback from customers. One of the prominent approaches for SA is using supervised machine learning (SML), an algorithm that uses datasets with defined class labels based on mathematical learning from a training dataset. While the results are promising especially with in-domain sentiments, there is no guarantee the model provides the same performance against real time data due to the diversity of new data. In addition, previous studies suggest the result of SML decrease when applied to cross-domain datasets because new features are appeared in different domains. So far, studies in SA emphasise the improvement of the sentiment result whereas there is little discussion focusing on how to detect the degradation of performance for the proposed model. Therefore, we provide a method known as Contextual Analysis (CA), a mechanism that constructs a relationship between words and sources that is constructed in a tree structure identified as Hierarchical Knowledge Tree (HKT). Then, Tree Similarity Index (TSI) and Tree Differences Index (TDI), a formula generate from tree structure are proposed to find similarity as well as changes between train and actual dataset. The regression analysis of datasets reveals that there is a highly significant positive relationship between TSI and SML accuracies. As a result, the prediction model created indicated estimation error within 2.75 to 3.94 and 2.30 for 3.51 for average absolute differences. Moreover, this method also can cluster sentiment words into positive and negative without having any linguistics resources used and at the same time capturing changes of sentiment words when a new dataset is applied.
引用
收藏
页码:17722 / 17733
页数:12
相关论文
共 31 条
[1]   A comprehensive survey of arabic sentiment analysis [J].
Al-Ayyoub, Mahmoud ;
Khamaiseh, Abed Allah ;
Jararweh, Yaser ;
Al-Kabi, Mohammed N. .
INFORMATION PROCESSING & MANAGEMENT, 2019, 56 (02) :320-342
[2]   Approaches to Cross-Domain Sentiment Analysis: A Systematic Literature Review [J].
Al-Moslmi, Tareq ;
Omar, Nazlia ;
Abdullah, Salwani ;
Albared, Mohammed .
IEEE ACCESS, 2017, 5 :16173-16192
[3]  
[Anonymous], 2007, P 45 ANN M ASS COMP
[4]   Enhancing deep learning sentiment analysis with ensemble techniques in social applications [J].
Araque, Oscar ;
Corcuera-Platas, Ignacio ;
Sanchez-Rada, J. Fernando ;
Iglesias, Carlos A. .
EXPERT SYSTEMS WITH APPLICATIONS, 2017, 77 :236-246
[5]  
Aziz AA, 2017, PROCEEDINGS OF THE 2017 INTELLIGENT SYSTEMS CONFERENCE (INTELLISYS), P689, DOI 10.1109/IntelliSys.2017.8324369
[6]   Selective genotyping and logistic regression analyses to identify favorable SNP-genotypes for clinical mastitis and production traits in Holstein dairy cattle [J].
Bagheri, M. ;
Miraie-Ashtiani, R. ;
Moradi-Shahrbabak, M. ;
Nejati-Javaremi, A. ;
Pakdel, A. ;
von Borstel, U. U. ;
Pimentel, E. C. G. ;
Koenig, S. .
LIVESTOCK SCIENCE, 2013, 151 (2-3) :140-151
[7]   Improving sentiment analysis via sentence type classification using BiLSTM-CRF and CNN [J].
Chen, Tao ;
Xu, Ruifeng ;
He, Yulan ;
Wang, Xuan .
EXPERT SYSTEMS WITH APPLICATIONS, 2017, 72 :221-230
[8]  
Dorard L., 2014, MACHINE LEARNING FAI
[9]   Techniques and Applications for Sentiment Analysis [J].
Feldman, Ronen .
COMMUNICATIONS OF THE ACM, 2013, 56 (04) :82-89
[10]   High dimensional data classification and feature selection using support vector machines [J].
Ghaddar, Bissan ;
Naoum-Sawaya, Joe .
EUROPEAN JOURNAL OF OPERATIONAL RESEARCH, 2018, 265 (03) :993-1004