Is this URL Safe: Detection of Malicious URLs Using Global Vector for Word Representation

被引:5
作者
Bharadwaj, Rohit [1 ]
Bhatia, Ashutosh [1 ]
Chhibbar, Laxmi Divya [1 ]
Tiwari, Kamlesh [1 ]
Agrawal, Ankit [1 ]
机构
[1] Birla Inst Technol & Sci Pilani, Dept Comp Sci & Informat Syst, Jhunjhunu 333031, Rajasthan, India
来源
36TH INTERNATIONAL CONFERENCE ON INFORMATION NETWORKING (ICOIN 2022) | 2022年
关键词
Machine Learning; URL Classification; GloVe embedding model;
D O I
10.1109/ICOIN53446.2022.9687204
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Users are frequently exposed to many unknown links through advertisements and emails. These links may contain URLs to mount targeted attacks like spamming, phishing, and malware installation. Using blacklist of URLs is the most widely used defense mechanism to detect a malicious URLs. However, automatically generating such a list for fresh malicious URLs is challenging. Detecting a URL as malicious using the lexicographical approach is an important research problem. This paper proposes a malicious URL detection mechanism using natural language processing. We use features including word vector representation obtained through GloVe along with statistical cues and n-gram on blacklist words. The proposed approach is efficient, and it does not require inputs from external servers to identify malicious URLs. Experiments are performed on 227,909 size database containing 80,128 benign and 147,781 malicious URLs. Proposed system has achieved an accuracy of 89% for ANN model with GloVe based features.
引用
收藏
页码:486 / 491
页数:6
相关论文
共 25 条
[1]   Dosage-dependent over-expression of genes in the trisomic region of Ts1Cje mouse model for Down syndrome [J].
Amano, K ;
Sago, H ;
Uchikawa, C ;
Suzuki, T ;
Kotliarova, SE ;
Nukina, N ;
Epstein, CJ ;
Yamakawa, K .
HUMAN MOLECULAR GENETICS, 2004, 13 (13) :1333-1340
[2]  
[Anonymous], 2012, PROC INT C SECUR PRI
[3]  
Darling M, 2015, PROCEEDINGS OF THE 2015 INTERNATIONAL CONFERENCE ON HIGH PERFORMANCE COMPUTING & SIMULATION (HPCS 2015), P195, DOI 10.1109/HPCSim.2015.7237040
[4]  
Eke C.I., 2020, P 2020 INT C MATH CO, P1
[5]  
Garera S., 2007, P 2007 ACM WORKSHOP
[6]  
Ghalati N.F., 2020, DOCTORAL C COMPUTING, P109
[7]  
James J, 2013, 2013 INTERNATIONAL CONFERENCE ON CONTROL COMMUNICATION AND COMPUTING (ICCC), P304, DOI 10.1109/ICCC.2013.6731669
[8]  
Joshi A., 2019, ARXIV PREPRINT ARXIV
[9]   Learning to Detect Malicious URLs [J].
Ma, Justin ;
Saul, Lawrence K. ;
Savage, Stefan ;
Voelker, Geoffrey M. .
ACM TRANSACTIONS ON INTELLIGENT SYSTEMS AND TECHNOLOGY, 2011, 2 (03)
[10]  
Ma J, 2009, KDD-09: 15TH ACM SIGKDD CONFERENCE ON KNOWLEDGE DISCOVERY AND DATA MINING, P1245