Unbalanced data sentiment classification method based on ensemble learning

被引:2
作者
Duan, Jidong [1 ]
Ma, Kun [1 ]
Sun, Runyuan [1 ]
机构
[1] Univ Jinan, Sch Informat Sci & Engn, Jinan 250022, Peoples R China
来源
PROCEEDINGS OF 2019 2ND INTERNATIONAL CONFERENCE ON BIG DATA TECHNOLOGIES (ICBDT 2019) | 2019年
基金
中国国家自然科学基金;
关键词
Unbalanced data; sentiment classification; Ensemble learning; stacking; FEATURE-SELECTION METHOD;
D O I
10.1145/3358528.3358597
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
Sentiment classification is a hot research direction at present, but most research is based on balanced data sets. In real life, the sample is impossible to balance. For sentiment analysis of unbalanced data, we not only need to pay attention to the overall classification performance, but also need to care about the classification performance of a few classes. How to improve the recognition rate of a few types of samples while improving the overall recognition rate has become a research hotspot. Aiming at this problem, this paper proposes a model based on ensemble learning, extracts features by TF-IDF+SVD, and integrates five base classifiers by stacking to sentiment classification. The experimental results show that it can be more effective in emotional classification on unbalanced data sets than other methods.
引用
收藏
页码:34 / 38
页数:5
相关论文
共 11 条
[1]  
[Anonymous], SYNTHESIS LECT HUM L
[2]   BPSO-Adaboost-KNN ensemble learning algorithm for multi-class imbalanced data classification [J].
Guo Haixiang ;
Li Yijing ;
Li Yanan ;
Liu Xiao ;
Li Jinling .
ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE, 2016, 49 :176-193
[3]   Impact of Highway Investment on the Economy and Employment Across US Industrial Sectors Simultaneous Equations Analysis at the Metropolitan Level [J].
He, Xiang ;
Kastrouni, Eirini ;
Zhang, Lei .
TRANSPORTATION RESEARCH RECORD, 2014, (2452) :1-10
[4]   A NEW SUPERVISED FEATURE SELECTION METHOD FOR PATTERN CLASSIFICATION [J].
Liu, Huawen ;
Wu, Xindong ;
Zhang, Shichao .
COMPUTATIONAL INTELLIGENCE, 2014, 30 (02) :342-361
[5]   Stream-based live public opinion monitoring approach with adaptive probabilistic topic model [J].
Ma, Kun ;
Yu, Ziqiang ;
Ji, Ke ;
Yang, Bo .
SOFT COMPUTING, 2019, 23 (16) :7451-7470
[6]   From Frequency to Meaning: Vector Space Models of Semantics [J].
Turney, Peter D. ;
Pantel, Patrick .
JOURNAL OF ARTIFICIAL INTELLIGENCE RESEARCH, 2010, 37 :141-188
[7]   Reliability Assessment of Travelsky Passenger Information and Service System based on Competitive Matching Selection Model [J].
Wang, Jing ;
Wang, Huaichao ;
Wang, Lingxiao .
PROCEEDINGS OF THE 2018 2ND INTERNATIONAL CONFERENCE ON MANAGEMENT ENGINEERING, SOFTWARE ENGINEERING AND SERVICE SCIENCES (ICMSS 2018), 2018, :6-11
[8]   Novel feature selection method based on harmony search for email classification [J].
Wang, Youwei ;
Liu, Yuanning ;
Feng, Lizhou ;
Zhu, Xiaodong .
KNOWLEDGE-BASED SYSTEMS, 2015, 73 :311-323
[9]   Stacking-Based Ensemble Learning of Self-Media Data for Marketing Intention Detection [J].
Wang, Yufeng ;
Liu, Shuangrong ;
Li, Songqian ;
Duan, Jidong ;
Hou, Zhihao ;
Yu, Jia ;
Ma, Kun .
FUTURE INTERNET, 2019, 11 (07)
[10]   Ensemble classification based on supervised clustering for credit scoring [J].
Xiao, Hongshan ;
Xiao, Zhi ;
Wang, Yu .
APPLIED SOFT COMPUTING, 2016, 43 :73-86