A Lightweight Multi-View Learning Approach for Phishing Attack Detection Using Transformer with Mixture of Experts

被引:5
作者
Wang, Yanbin [1 ]
Ma, Wenrui [1 ]
Xu, Haitao [1 ]
Liu, Yiwei [2 ]
Yin, Peng [2 ,3 ]
机构
[1] Zhejiang Univ, Sch Cyber & Technol, Hangzhou 310027, Peoples R China
[2] Def Ind Secrecy Examinat & Certificat Ctr, Beijing 100089, Peoples R China
[3] Univ Chinese Acad Sci, Sch Cyber Secur, Beijing 101408, Peoples R China
来源
APPLIED SCIENCES-BASEL | 2023年 / 13卷 / 13期
基金
中国国家自然科学基金;
关键词
phishing attack detection; multi-view learning; transformer; self-supervised learning; MALICIOUS URL; MODEL;
D O I
10.3390/app13137429
中图分类号
O6 [化学];
学科分类号
0703 ;
摘要
Phishing poses a significant threat to the financial and privacy security of internet users and often serves as the starting point for cyberattacks. Many machine-learning-based methods for detecting phishing websites rely on URL analysis, offering simplicity and efficiency. However, these approaches are not always effective due to the following reasons: (1) highly concealed phishing websites may employ tactics such as masquerading URL addresses to deceive machine learning models, and (2) phishing attackers frequently change their phishing website URLs to evade detection. In this study, we propose a robust, multi-view Transformer model with an expert-mixture mechanism for accurate phishing website detection utilizing website URLs, attributes, content, and behavioral information. Specifically, we first adapted a pretrained language model for URL representation learning by applying adversarial post-training learning in order to extract semantic information from URLs. Next, we captured the attribute, content, and behavioral features of the websites and encoded them as vectors, which, alongside the URL embeddings, constitute the website's multi-view information. Subsequently, we introduced a mixture-of-experts mechanism into the Transformer network to learn knowledge from different views and adaptively fuse information from various views. The proposed method outperforms state-of-the-art approaches in evaluations of real phishing websites, demonstrating greater performance with less label dependency. Furthermore, we show the superior robustness and enhanced adaptability of the proposed method to unseen samples and data drift in more challenging experimental settings.
引用
收藏
页数:17
相关论文
共 49 条
  • [1] Adewole Kayode S., 2019, Emerging Technologies in Computing. Second International Conference, iCETiC 2019. Proceedings. Lecture Notes of the Institute for Computer Sciences, Social Informatics and Telecommunications Engineering (LNICST 285), P119, DOI 10.1007/978-3-030-23943-5_9
  • [2] PhishZoo: Detecting Phishing Websites By Looking at Them
    Afroz, Sadia
    Greenstadt, Rachel
    [J]. FIFTH IEEE INTERNATIONAL CONFERENCE ON SEMANTIC COMPUTING (ICSC 2011), 2011, : 368 - 375
  • [3] Phishing URL detection using machine learning methods
    Ahammad, S. K. Hasane
    Kale, Sunil D.
    Upadhye, Gopal D.
    Pande, Sandeep Dwarkanath
    Babu, E. Venkatesh
    Dhumane, Amol, V
    Bahadur, Dilip Kumar Jang
    [J]. ADVANCES IN ENGINEERING SOFTWARE, 2022, 173
  • [4] Al-Ahmadi S., 2020, INT J COMPUT NETW CO, V12, P41, DOI [10.5121/ijcnc.2020.12503, DOI 10.5121/IJCNC.2020.12503]
  • [5] Phishing Attacks Detection using Machine Learning and Deep Learning Models
    Aljabri, Malak
    Mirza, Samiha
    [J]. 2022 7TH INTERNATIONAL CONFERENCE ON DATA SCIENCE AND MACHINE LEARNING APPLICATIONS (CDMA 2022), 2022, : 175 - 180
  • [6] An effective detection approach for phishing websites using URL and HTML']HTML features
    Aljofey, Ali
    Jiang, Qingshan
    Rasool, Abdur
    Chen, Hui
    Liu, Wenyin
    Qu, Qiang
    Wang, Yang
    [J]. SCIENTIFIC REPORTS, 2022, 12 (01)
  • [7] Character-level word encoding deep learning model for combating cyber threats in phishing URL detection
    Alshehri, Mohammed
    Abugabah, Ahed
    Algarni, Abdullah
    Almotairi, Sultan
    [J]. COMPUTERS & ELECTRICAL ENGINEERING, 2022, 100
  • [8] Combining Long-Term Recurrent Convolutional and Graph Convolutional Networks to Detect Phishing Sites Using URL and HTML']HTML
    Ariyadasa, Subhash
    Fernando, Shantha
    Fernando, Subha
    [J]. IEEE ACCESS, 2022, 10 : 82355 - 82375
  • [9] Basnet R, 2008, STUD FUZZ SOFT COMP, V226, P373, DOI 10.1007/978-3-540-77465-5_19
  • [10] Benavides-Astudillo E., 2022, INT C APPL TECHNOLOG, P386