Feature Selection Framework for Optimizing ML-based Malicious URL Detection

被引:0
作者
Shah, Sajjad H. [1 ]
Garu, Amit [1 ]
Nguyen, Duong N. [1 ]
Borowczak, Mike [2 ]
机构
[1] Univ Wyoming, Dept Elect Engn & Comp Sci, Laramie, WY 82071 USA
[2] Univ Cent Florida, Dept Elect & Comp Engn, Orlando, FL 32816 USA
来源
2024 CYBER AWARENESS AND RESEARCH SYMPOSIUM, CARS 2024 | 2024年
关键词
Malicious URL detection; cybersecurity; machine learning; feature selection;
D O I
10.1109/CARS61786.2024.10778786
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Malicious URLs are one of the most common and active vectors for launching cyber-attacks such as spam, phishing, social engineering, and malware. They result in billions of dollars in annual losses and call for effective detection techniques. Machine-learning-based detection is among the most promising candidates, but its performance depends on various factors, including the proper selection of representation features. This step often requires special expert domain knowledge and may be carried out manually, particularly in unique and specialized applications. This paper proposes an approach combining several methods (information gain, genetic algorithms, random forest) to select a small set of representation features to train the ML model efficiently. Experimental results show that ML models trained from the selected features yield comparable performance to the traditional approach while requiring less time and computational resources.
引用
收藏
页数:6
相关论文
共 21 条
[1]   Improved Blacklisting: Inspecting the Structural Neighborhood of Malicious URLs [J].
Akiyama, Mitsuaki ;
Yagi, Takeshi ;
Hariu, Takeo .
IT PROFESSIONAL, 2013, 15 (04) :50-56
[2]  
Almashor Mahathir, 2023, AINTEC '23: Proceedings of the 18th Asian Internet Engineering Conference, P78, DOI 10.1145/3630590.3630600
[3]  
Alpaydin E., 2020, Introduction to machine learning
[4]  
[Anonymous], 2021, Rsa quarterly fraud report
[5]  
[Anonymous], 2021, Kaggle malicious urls dataset
[6]  
[Anonymous], 2024, Phiusiil phishing url
[7]  
de Sousa MS, 2022, IBER CONF INF SYST, DOI 10.23919/CISTI54924.2022.9820579
[8]  
Dewald Andreas., 2010, Proceedings of the 2010 Symposium on Applied Computing (SAC), P1859
[9]  
Kazi MA, 2022, INT J GRID UTIL COMP, V13, P495, DOI [10.1504/IJGUC.2022.126167, 10.1504/IJGUC.2022.10051205]
[10]   Phishing Detection: A Literature Survey [J].
Khonji, Mahmoud ;
Iraqi, Youssef ;
Jones, Andrew .
IEEE COMMUNICATIONS SURVEYS AND TUTORIALS, 2013, 15 (04) :2091-2121