Extreme Gradient Boosting for Cyberpropaganda Detection

被引:2
作者
Fattahi, Jaouhar [1 ]
Mejri, Mohamed [1 ]
Ziadia, Marwa [1 ]
机构
[1] Laval Univ, Dept Comp Sci & Software Engn, 2325 Rue Univ, Quebec City, PQ G1V 0A6, Canada
来源
NEW TRENDS IN INTELLIGENT SOFTWARE METHODOLOGIES, TOOLS AND TECHNIQUES | 2021年 / 337卷
基金
加拿大自然科学与工程研究理事会;
关键词
Security; cyberpropaganda; XGBoost; BoW; TF-IDF; NLP; LIGHTGBM; CLASSIFICATION;
D O I
10.3233/FAIA210012
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Propaganda, defamation, abuse, insults, disinformation and fake news are not new phenomena and have been around for several decades. However, with the advent of the Internet and social networks, their magnitude has increased and the damage caused to individuals and corporate entities is becoming increasingly greater, even irreparable. In this paper, we tackle the detection of text-based cyberpropaganda using Machine Learning and NLP techniques. We use the eXtreme Gradient Boosting (XGBoost) algorithm for learning and detection, in tandem with Bag-of-Words (BoW) and Term Frequency-Inverse Document Frequency (TF-IDF) for text vectorization. We highlight the contribution of gradient boosting and regularization mechanisms in the performance of the explored model.
引用
收藏
页码:99 / 112
页数:14
相关论文
共 34 条
[1]   Recurrent Neural Networks With TF-IDF Embedding Technique for Detection and Classification in Tweets of Dengue Disease [J].
Amin, Samina ;
Uddin, M. Irfan ;
Hassan, Saima ;
Khan, Atif ;
Nasser, Nidal ;
Alharbi, Abdullah ;
Alyami, Hashem .
IEEE ACCESS, 2020, 8 :131522-131533
[2]   Identifying Fake News and Fake Users on Twitter [J].
Atodiresei, Costel-Sergiu ;
Tanaselea, Alexandru ;
Iftene, Adrian .
KNOWLEDGE-BASED AND INTELLIGENT INFORMATION & ENGINEERING SYSTEMS (KES-2018), 2018, 126 :451-461
[3]  
Karbab EB, 2021, Arxiv, DOI [arXiv:2105.13491, DOI 10.4855/ARXIV.2105.13491]
[4]   Random forests [J].
Breiman, L .
MACHINE LEARNING, 2001, 45 (01) :5-32
[5]  
Breiman L., 2017, Classification and Regression Trees, DOI 10.1201/9781315139470
[6]   T-CREo: A Twitter Credibility Analysis Framework [J].
Cardinale, Yudith ;
Dongo, Irvin ;
Robayo, German ;
Cabeza, David ;
Aguilera, Ana ;
Medina, Sergio .
IEEE ACCESS, 2021, 9 :32498-32516
[7]   XGBoost: A Scalable Tree Boosting System [J].
Chen, Tianqi ;
Guestrin, Carlos .
KDD'16: PROCEEDINGS OF THE 22ND ACM SIGKDD INTERNATIONAL CONFERENCE ON KNOWLEDGE DISCOVERY AND DATA MINING, 2016, :785-794
[9]  
Conti M, 2017, IEEE INT WORKS INFOR
[10]  
Dongo I, 2019, 2019 4TH INTERNATIONAL CONFERENCE ON SYSTEM RELIABILITY AND SAFETY (ICSRS 2019), P116, DOI [10.1109/icsrs48664.2019.8987623, 10.1109/ICSRS48664.2019.8987623]