Detecting Spam Tweets Using Machine Learning and Effective Preprocessing

被引:2
作者
Kardas, Berk
Bayar, Ismail Erdem
Ozyer, Tansel
Alhajj, Reda
机构
[1] TOBB Univ, Dept Comp Engn, Ankara, Turkey
[2] Istanbul Medipol Univ, Dept Comp Engn, Istanbul, Turkey
[3] Univ Southern Denmark, Dept Hlth Informat, Odense, Denmark
来源
PROCEEDINGS OF THE 2021 IEEE/ACM INTERNATIONAL CONFERENCE ON ADVANCES IN SOCIAL NETWORKS ANALYSIS AND MINING, ASONAM 2021 | 2021年
关键词
Twitter; spam detection; machine learning; preprocessing; social media;
D O I
10.1145/3487351.3490968
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Nowadays, with the rapid increase in popularity of online social networks (OSNs), these platforms are realized as ideal places for spammers. Unfortunately, these spammers can easily publish malicious content, advertise phishing scams by taking advantage of OSNs. Therefore, effective identification and filtering of spam tweets will be beneficial to both OSNs and users. However, it is becoming increasingly difficult to check and eliminate spam tweets due to this great flow of posts. Motivated by these observations, in this paper we propose an approach for the detection of spam tweets using machine learning and effective preprocessing techniques. The approach proposes the advantages of the preprocessing and which of these preprocessing techniques are the most effective. To compare these techniques UtkML Twitter spam dataset is used in testing. After the most effective methods determined, the detection accuracy of the spam tweets will be better optimized by combining them. We have evaluated our solution with four different machine learning algorithms namely - Naive Bayes Classifier, Neural Network, Logistic Regression and Support Vector Machine. With SVM Classifier, we are able to achieve an accuracy of 93.02%. Experimental results show that our approach can improve the performance of spam tweet classification effectively.
引用
收藏
页码:393 / 398
页数:6
相关论文
共 17 条
[1]   An effective feature selection method for web spam detection [J].
Asdaghi, Faeze ;
Soleimani, Ali .
KNOWLEDGE-BASED SYSTEMS, 2019, 166 :198-206
[2]  
Aski Ali Shafigh, 2016, Pacific Science Review A: Natural Science and Engineering, V18, P145, DOI 10.1016/j.psra.2016.09.017
[3]  
Bhagyashri G., 2013, International Journal of Advanced technology & Engineering Research, V3
[4]  
Clement J., 2018, About us
[5]   Concept decompositions for large sparse text data using clustering [J].
Dhillon, IS ;
Modha, DS .
MACHINE LEARNING, 2001, 42 (1-2) :143-175
[6]  
GRAHAM PAUL., A PLAN FOR SPAM
[7]   Detection of spam-posting accounts on Twitter [J].
Inuwa-Dutse, Isa ;
Liptrott, Mark ;
Korkontzelos, Ioannis .
NEUROCOMPUTING, 2018, 315 :496-511
[8]  
Kandasamy K, 2014, 2014 IEEE STUDENTS' CONFERENCE ON ELECTRICAL, ELECTRONICS AND COMPUTER SCIENCE (SCEECS)
[9]  
Kumar S., 2015, MIDDLE-EAST J SCI RE, V23, P874, DOI DOI 10.5829/idosi.mejsr.2015.23.05.22221
[10]   Uncovering Social Spammers: Social Honeypots plus Machine Learning [J].
Lee, Kyumin ;
Caverlee, James ;
Webb, Steve .
SIGIR 2010: PROCEEDINGS OF THE 33RD ANNUAL INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH DEVELOPMENT IN INFORMATION RETRIEVAL, 2010, :435-442