Machine learning for detecting fake accounts and genetic algorithm-based feature selection

被引:2
作者
Sallah, Amine [1 ]
Alaoui, El Arbi Abdellaoui [2 ]
Tekouabou, Stephane C. K. [3 ,4 ]
Agoujil, Said [5 ]
机构
[1] Moulay Ismail Univ, Fac Sci & Tech, Dept Comp Sci, Meknes, Morocco
[2] Moulay Ismail Univ, Ecole Normale Super, Dept Sci, Meknes, Morocco
[3] Univ Yaounde I, Res Lab Comp Sci & Educ Technol LITE, Yaounde, Cameroon
[4] Univ Yaounde I, Higher Teacher Training Coll HTTC, Dept Comp Sci & Educ Technol DITE, Yaounde, Cameroon
[5] Moulay Ismail Univ, Ecole Natl Commerce Gest, Dept Sci, Meknes, Morocco
来源
DATA & POLICY | 2024年 / 6卷
关键词
Boruta; classification; fake account; feature selection; genetic algorithm;
D O I
10.1017/dap.2023.46
中图分类号
C93 [管理学]; D035 [国家行政管理]; D523 [行政管理]; D63 [国家行政管理];
学科分类号
12 ; 1201 ; 1202 ; 120202 ; 1204 ; 120401 ;
摘要
People rely extensively on online social networks (OSNs) in Africa, which aroused cyber attackers' attention for various nefarious actions. This global trend has not spared African online communities, where the proliferation of OSNs has provided new opportunities and challenges. In Africa, as in many other regions, a burgeoning black-market industry has emerged, specializing in the creation and sale of fake accounts to serve various purposes, both malicious and deceptive. This paper aims to build a set of machine -learning models through feature selection algorithms to predict the fake account, increase performance, and reduce costs. The suggested approach is based on input data made up of features that describe the profiles being investigated. Our findings offer a thorough comparison of various algorithms. Furthermore, compared to machine learning without feature selection and Boruta, machine learning employing the suggested genetic algorithmbased feature selection offers a clear runtime advantage. The final prediction model achieves AUC values between 90% and 99.6%. The findings showed that the model based on the features chosen by the GA algorithm provides a reasonable prediction quality with a small number of input variables, less than 31% of the entire feature space, and therefore permits the accurate separation of fake from real users. Our results demonstrate exceptional predictive accuracy with a significant reduction in input variables using the genetic algorithm, reaffirming the effectiveness of our approach. Policy Significance Statement Machine -learning algorithms coupled with genetic algorithm -based feature selection offer a powerful approach for detecting fake accounts in Online Social Networks platforms. This research demonstrates that by leveraging advanced machine -learning techniques and employing genetic algorithms for feature selection, it is possible to achieve highly accurate and efficient identification of fake accounts. Furthermore, the integration of genetic algorithm -based feature selection optimizes the performance of the detection system by identifying the most informative features. This improves the efficiency and effectiveness of fake account detection and reduces computational complexity.
引用
收藏
页数:17
相关论文
共 36 条
[1]   HYBRID FEATURE SELECTION FRAMEWORK FOR SENTIMENT ANALYSIS ON LARGE CORPORA [J].
Adewole, Kayode S. ;
Balogun, Abdullateef O. ;
Raheem, Muiz O. ;
Jimoh, Muhammed K. ;
Jimoh, Rasheed G. ;
Mabayoje, Modinat A. ;
Usman-Hamza, Fatima E. ;
Akintola, Abimbola G. ;
Asaju-Gbolagade, Ayisat W. .
JORDANIAN JOURNAL OF COMPUTERS AND INFORMATION TECHNOLOGY, 2021, 7 (02) :130-151
[2]   Instagram Fake and Automated Account Detection [J].
Akyon, Fatih Cagatay ;
Kalfaoglu, M. Esat .
2019 INNOVATIONS IN INTELLIGENT SYSTEMS AND APPLICATIONS CONFERENCE (ASYU), 2019, :519-525
[3]   Identifying Fake Facebook Profiles Using Data Mining Techniques [J].
Albayati, Mohammed Basil ;
Altamimi, Ahmad Mousa .
JOURNAL OF ICT RESEARCH AND APPLICATIONS, 2019, 13 (02) :107-117
[4]   Optimal feature selection using binary teaching learning based optimization algorithm [J].
Allam, Mohan ;
Nandhini, M. .
JOURNAL OF KING SAUD UNIVERSITY-COMPUTER AND INFORMATION SCIENCES, 2022, 34 (02) :329-341
[5]   Feature selection on educational data using Boruta algorithm [J].
Anand, Neeyati ;
Sehgal, Riya ;
Anand, Sanchit ;
Kaushik, Ajay .
International Journal of Computational Intelligence Studies, 2021, 10 (01) :27-35
[6]  
Bakhshandeh B., 2019, Instagram fake spammer genuine accounts
[7]   Typing Pattern Analysis for Fake Profile Detection in Social Media [J].
Bhattasali, Tapalina ;
Saeed, Khalid .
COMPUTER INFORMATION SYSTEMS AND INDUSTRIAL MANAGEMENT, CISIM 2021, 2021, 12883 :17-27
[8]   Feature selection in machine learning: A new perspective [J].
Cai, Jie ;
Luo, Jiawei ;
Wang, Shulin ;
Yang, Sheng .
NEUROCOMPUTING, 2018, 300 :70-79
[9]   Rhythmedia: A Study of Facebook Immune System [J].
Carmi, Elinor .
THEORY CULTURE & SOCIETY, 2020, 37 (05) :119-138
[10]   Feature selection for text classification: A review [J].
Deng, Xuelian ;
Li, Yuqing ;
Weng, Jian ;
Zhang, Jilian .
MULTIMEDIA TOOLS AND APPLICATIONS, 2019, 78 (03) :3797-3816