Building Fake Review Detection Model Based on Sentiment Intensity and PU Learning

被引:24
作者
Zhang, Shunxiang [1 ,2 ]
Zhu, Aoqiang [1 ]
Zhu, Guangli [1 ]
Wei, Zhongliang [1 ]
Li, KuanChing [3 ]
机构
[1] Anhui Univ Sci & Technol, Sch Comp Sci & Engn, Huainan 231001, Peoples R China
[2] Artificial Intelligence Res Inst, Hefei Comprehens Natl Sci Ctr, Hefei 230000, Peoples R China
[3] Providence Univ, Dept Comp Sci & Informat Engn CSIE, Taichung 43301, Taiwan
基金
中国国家自然科学基金;
关键词
Training; Dictionaries; Blogs; Data models; Sentiment analysis; Predictive models; Feature extraction; Fake reviews; positive-unlabeled (PU) learning; semi-supervised learning; sentiment analysis; DATA STREAM; FUSION; SYSTEM; NEWS;
D O I
10.1109/TNNLS.2023.3234427
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Fake review detection has the characteristics of huge stream data processing scale, unlimited data increment, dynamic change, and so on. However, the existing fake review detection methods mainly target limited and static review data. In addition, deceptive fake reviews have always been a difficult point in fake review detection due to their hidden and diverse characteristics. To solve the above problems, this article proposes a fake review detection model based on sentiment intensity and PU learning (SIPUL), which can continuously learn the prediction model from the constantly arriving streaming data. First, when the streaming data arrive, the sentiment intensity is introduced to divide the reviews into different subsets (i.e., strong sentiment set and weak sentiment set). Then, the initial positive and negative samples are extracted from the subset using the marking mechanism of selection completely at random (SCAR) and Spy technology. Second, building a semi-supervised positive-unlabeled (PU) learning detector based on the initial sample to detect fake reviews in the data stream iteratively. According to the detection results, the data of initial samples and the PU learning detector are continuously updated. Finally, the old data are continually deleted according to the historical record points, so that the training sample data are within a manageable size and prevent overfitting. Experimental results show that the model can effectively detect fake reviews, especially deceptive reviews.
引用
收藏
页码:6926 / 6939
页数:14
相关论文
共 59 条
[1]   Building text classifiers using positive and unlabeled examples [J].
Bing, L ;
Yang, D ;
Li, XL ;
Lee, WS ;
Yu, PS .
THIRD IEEE INTERNATIONAL CONFERENCE ON DATA MINING, PROCEEDINGS, 2003, :179-186
[2]   A survey on fake news and rumour detection techniques [J].
Bondielli, Alessandro ;
Marcelloni, Francesco .
INFORMATION SCIENCES, 2019, 497 :38-55
[3]   Semi-supervised clue fusion for spammer detection in Sina Weibo [J].
Chen, Hao ;
Liu, Jun ;
Lv, Yanzhang ;
Li, Max Haifei ;
Liu, Mengyue ;
Zheng, Qinghua .
INFORMATION FUSION, 2018, 44 :22-32
[4]  
Cherkassky V, 1997, IEEE Trans Neural Netw, V8, P1564, DOI 10.1109/TNN.1997.641482
[5]   Identifying malicious social media contents using multi-view Context-Aware active learning [J].
Das Bhattacharjee, Sreyasee ;
Tolone, William J. ;
Paranjape, Ved Suhas .
FUTURE GENERATION COMPUTER SYSTEMS-THE INTERNATIONAL JOURNAL OF ESCIENCE, 2019, 100 :365-379
[6]   MAXIMUM LIKELIHOOD FROM INCOMPLETE DATA VIA EM ALGORITHM [J].
DEMPSTER, AP ;
LAIRD, NM ;
RUBIN, DB .
JOURNAL OF THE ROYAL STATISTICAL SOCIETY SERIES B-METHODOLOGICAL, 1977, 39 (01) :1-38
[7]   Dynamic knowledge graph based fake-review detection [J].
Fang, Youli ;
Wang, Hong ;
Zhao, Lili ;
Yu, Fengping ;
Wang, Caiyu .
APPLIED INTELLIGENCE, 2020, 50 (12) :4281-4295
[8]   Open Set Domain Adaptation: Theoretical Bound and Algorithm [J].
Fang, Zhen ;
Lu, Jie ;
Liu, Feng ;
Xuan, Junyu ;
Zhang, Guangquan .
IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2021, 32 (10) :4309-4322
[9]   Large-Margin Label-Calibrated Support Vector Machines for Positive and Unlabeled Learning [J].
Gong, Chen ;
Liu, Tongliang ;
Yang, Jian ;
Tao, Dacheng .
IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2019, 30 (11) :3471-3483
[10]   Fake consumer review detection using deep neural networks integrating word embeddings and emotion mining [J].
Hajek, Petr ;
Barushka, Aliaksandr ;
Munk, Michal .
NEURAL COMPUTING & APPLICATIONS, 2020, 32 (23) :17259-17274