Predictive intelligence in harmful news identification by BERT-based ensemble learning model with text sentiment analysis

被引：63

作者：

Lin, Szu-Yin ^{[1
]}

Kung, Yun-Ching ^{[2
]}

Leu, Fang-Yie ^{[3
]}

机构：

[1] Natl Ilan Univ, Dept Comp Sci & Informat Engn, Yilan, Taiwan

[2] Chung Yuan Christian Univ, Dept Informat Management, Taoyuan, Taiwan

[3] Tunghai Univ, Dept Comp Sci, Taichung, Taiwan

来源：

INFORMATION PROCESSING & MANAGEMENT | 2022年 / 59卷 / 02期

关键词：

Information disorder; Harmful news analysis; Natural language processing; News sentiment analysis; Ensemble learning; BERT model; INFORMATION;

D O I：

10.1016/j.ipm.2022.102872

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

In an environment full of disordered information, the media spreads fake or harmful information into the public arena with a speed which is faster than ever before. A news report should ideally be neutral and factual. Excessive personal emotions or viewpoints should not be included. News articles ought not to be intentionally or maliciously written or create a media framing. A harmful news is defined as those explicit or implicit harmful speech in news text that harms people or affects readers' perception. However, in the current situation, it is difficult to effectively identify and predict fake or harmful news in advance, especially harmful news. Therefore, in this study, we propose a Bidirectional Encoder Representation from Transformers (BERT) based model which applies ensemble learning methods with a text sentiment analysis to identify harmful news, aiming to provide readers with a way to identify harmful news content so as to help them to judge whether the information provided is in a more neutral manner. The working model of the proposed system has two phases. The first phase is collecting harmful news and establishing a development model for analyzing the correlation between text sentiment and harmful news. The second phase is identifying harmful news by analyzing text sentiment with an ensemble learning technique and the BERT model. The purpose is to determine whether the news has harmful intentions. Our experimental results show that the F1-score of the proposed model reaches 66.3%, an increase of 7.8% compared with that of the previous term frequency-inverse document frequency approach which adopts a Lagrangian Support Vector Machine (LSVM) model without using a text sentiment. Moreover, the proposed method achieves a better performance in recognizing various cases of information disorder.

引用

页数：18

共 34 条

[1] Unsupervised text feature selection technique based on hybrid particle swarm optimization algorithm with genetic operators for the text clustering [J].