Predictive intelligence in harmful news identification by BERT-based ensemble learning model with text sentiment analysis

被引:63
作者
Lin, Szu-Yin [1 ]
Kung, Yun-Ching [2 ]
Leu, Fang-Yie [3 ]
机构
[1] Natl Ilan Univ, Dept Comp Sci & Informat Engn, Yilan, Taiwan
[2] Chung Yuan Christian Univ, Dept Informat Management, Taoyuan, Taiwan
[3] Tunghai Univ, Dept Comp Sci, Taichung, Taiwan
关键词
Information disorder; Harmful news analysis; Natural language processing; News sentiment analysis; Ensemble learning; BERT model; INFORMATION;
D O I
10.1016/j.ipm.2022.102872
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
In an environment full of disordered information, the media spreads fake or harmful information into the public arena with a speed which is faster than ever before. A news report should ideally be neutral and factual. Excessive personal emotions or viewpoints should not be included. News articles ought not to be intentionally or maliciously written or create a media framing. A harmful news is defined as those explicit or implicit harmful speech in news text that harms people or affects readers' perception. However, in the current situation, it is difficult to effectively identify and predict fake or harmful news in advance, especially harmful news. Therefore, in this study, we propose a Bidirectional Encoder Representation from Transformers (BERT) based model which applies ensemble learning methods with a text sentiment analysis to identify harmful news, aiming to provide readers with a way to identify harmful news content so as to help them to judge whether the information provided is in a more neutral manner. The working model of the proposed system has two phases. The first phase is collecting harmful news and establishing a development model for analyzing the correlation between text sentiment and harmful news. The second phase is identifying harmful news by analyzing text sentiment with an ensemble learning technique and the BERT model. The purpose is to determine whether the news has harmful intentions. Our experimental results show that the F1-score of the proposed model reaches 66.3%, an increase of 7.8% compared with that of the previous term frequency-inverse document frequency approach which adopts a Lagrangian Support Vector Machine (LSVM) model without using a text sentiment. Moreover, the proposed method achieves a better performance in recognizing various cases of information disorder.
引用
收藏
页数:18
相关论文
共 34 条
[1]   Unsupervised text feature selection technique based on hybrid particle swarm optimization algorithm with genetic operators for the text clustering [J].
Abualigah, Laith Mohammad ;
Khader, Ahamad Tajudin .
JOURNAL OF SUPERCOMPUTING, 2017, 73 (11) :4773-4795
[2]   Detecting opinion spams and fake news using text classification [J].
Ahmed, Hadeer ;
Traore, Issa ;
Saad, Sherif .
SECURITY AND PRIVACY, 2018, 1 (01)
[3]   Detection of Online Fake News Using N-Gram Analysis and Machine Learning Techniques [J].
Ahmed, Hadeer ;
Traore, Issa ;
Saad, Sherif .
INTELLIGENT, SECURE, AND DEPENDABLE SYSTEMS IN DISTRIBUTED AND CLOUD ENVIRONMENTS (ISDDC 2017), 2017, 10618 :127-138
[4]  
Ajao O, 2019, INT CONF ACOUST SPEE, P2507, DOI [10.1109/ICASSP.2019.8683170, 10.1109/icassp.2019.8683170]
[5]  
[Anonymous], 2016, R. JOSS, DOI DOI 10.21105/JOSS.00037
[6]  
[Anonymous], 2016, P 2 WORKSH COMP APPR, DOI DOI 10.18653/V1/W16-0802
[7]  
Arndt J., 1967, Risk taking and information handling in consumer behavior, P188
[8]   Co-LSTM: Convolutional LSTM model for sentiment analysis in social big data [J].
Behera, Ranjan Kumar ;
Jena, Monalisa ;
Rath, Santanu Kumar ;
Misra, Sanjay .
INFORMATION PROCESSING & MANAGEMENT, 2021, 58 (01)
[9]  
Calefato F, 2017, INT CONF AFFECT, P79, DOI 10.1109/ACIIW.2017.8272591
[10]   Modeling public mood and emotion: Blog and news sentiment and socio-economic phenomena [J].
Chen, Mu-Yen ;
Chen, Ting-Hsuan .
FUTURE GENERATION COMPUTER SYSTEMS-THE INTERNATIONAL JOURNAL OF ESCIENCE, 2019, 96 :692-699