A hybrid classification method for Twitter spam detection based on differential evolution and random forest

被引:31
作者
Bazzaz Abkenar, Sepideh [1 ]
Mahdipour, Ebrahim [1 ]
Jameii, Seyed Mahdi [2 ]
Haghi Kashani, Mostafa [2 ]
机构
[1] Islamic Azad Univ, Sci & Res Branch, Dept Comp Engn, Tehran, Iran
[2] Islamic Azad Univ, Shahr E Qods Branch, Dept Comp Engn, Tehran, Iran
关键词
imbalanced dataset; machine learning; social networks; spam; Twitter;
D O I
10.1002/cpe.6381
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
Social networking services are online platforms that are distributed across different computers over long distances. Twitter is the most popular microblogging site that allows users to share their opinions and real-world events. Due to its popularity and ease of use, Twitter has also attracted spammers. As a result, spam detection is one of the most critical problems. In order to provide a spam-free environment, it is necessary to identify and filter spam tweets as well as their owners. A hybrid method, which is based on Synthetic Minority Over-sampling TEchnique (SMOTE) and Differential Evolution (DE) strategies, is presented to enhance the spam detection rate in real Twitter datasets. SMOTE is applied to tackle the imbalanced class distribution of datasets, while DE is used to tune Random Forest (RF) hyperparameters. Compared with related work and based on evaluation results, the presented method significantly enhances the classification performance in imbalanced datasets. The detection rate of optimized RF with excellent F-1-score and Area Under the Receiver Operating Characteristic Curve (AUROC), which are 98.97% and 0.999, respectively, demonstrates the high efficiency of the proposed method.
引用
收藏
页数:20
相关论文
共 50 条
[31]   Contrast Pattern-Based Classification for Bot Detection on Twitter [J].
Loyola-Gonzalez, Octavio ;
Monroy, Raul ;
Rodriguez, Jorge ;
Lopez-Cuevas, Armando ;
Israel Mata-Sanchez, Javier .
IEEE ACCESS, 2019, 7 :45800-45817
[32]   Interaction based on method for spam detection in online social networks [J].
Chen, Kan ;
Chen, Liang ;
Zhu, Pei-Dong ;
Xiong, Yue-Shan .
Tongxin Xuebao/Journal on Communications, 2015, 36 (07)
[33]   Social-spam Profile Detection based on Content Classification and User Behavior [J].
Thi-Hong Vuong ;
Van-Hien Tran ;
Minh-Duc Nguyen ;
Cam-Van Thi Nguyen ;
Thanh-Huyen Pham ;
Mai-Vu Tran .
2016 EIGHTH INTERNATIONAL CONFERENCE ON KNOWLEDGE AND SYSTEMS ENGINEERING (KSE), 2016, :264-267
[34]   HyMO-RF: Automatic Hyperparameter Tuning for Energy Theft Detection Based on Random Forest Classification [J].
Coelho, Francisco J. S. ;
Alcantara, Andre L. M. ;
Feitosa, Allan R. S. ;
Takeuchi, Jessica T. ;
Lima, Ronaldo F. ;
Silva-Filho, Abel G. .
INTELLIGENT SYSTEMS AND APPLICATIONS, VOL 1, INTELLISYS 2023, 2024, 822 :820-836
[35]   SMS Spam Message Detection using Term Frequency-Inverse Document Frequency and Random Forest Algorithm [J].
Sjarif, Nilam Nur Amir ;
Azmi, Nurulhuda Firdaus Mohd ;
Chuprat, Suriayati ;
Sarkan, Haslina ;
Yahya, Yazriwati ;
Sam, Suriani Mohd .
FIFTH INFORMATION SYSTEMS INTERNATIONAL CONFERENCE, 2019, 161 :509-515
[36]   Automatic Classification of Pulmonary Tuberculosis and Sarcoidosis based on Random Forest [J].
Wu, Yuanli ;
Wang, Hong ;
Wu, Fei .
2017 10TH INTERNATIONAL CONGRESS ON IMAGE AND SIGNAL PROCESSING, BIOMEDICAL ENGINEERING AND INFORMATICS (CISP-BMEI), 2017,
[37]   The Random Forest based Detection of Shadowsock's Traffic [J].
Deng, Ziye ;
Liu, Zihan ;
Chen, Zhouguo ;
Guo, Yubin .
2017 NINTH INTERNATIONAL CONFERENCE ON INTELLIGENT HUMAN-MACHINE SYSTEMS AND CYBERNETICS (IHMSC 2017), VOL 2, 2017, :75-78
[38]   Web architecture for URL-based phishing detection based on Random Forest, Classification Trees, and Support Vector Machine [J].
Lamas Pineiro, Javier Julio Martin ;
Wong Portillo, Lenis R. .
INTELIGENCIA ARTIFICIAL-IBEROAMERICAL JOURNAL OF ARTIFICIAL INTELLIGENCE, 2022, 25 (69) :107-121
[39]   IoT and cloud computing based automatic epileptic seizure detection using HOS features based random forest classification [J].
Singh, Kuldeep ;
Malhotra, Jyoteesh .
JOURNAL OF AMBIENT INTELLIGENCE AND HUMANIZED COMPUTING, 2019, 14 (11) :15497-15512
[40]   IoT and cloud computing based automatic epileptic seizure detection using HOS features based random forest classification [J].
Kuldeep Singh ;
Jyoteesh Malhotra .
Journal of Ambient Intelligence and Humanized Computing, 2023, 14 :15497-15512