A Lightweight Multi-View Learning Approach for Phishing Attack Detection Using Transformer with Mixture of Experts

被引:5
作者
Wang, Yanbin [1 ]
Ma, Wenrui [1 ]
Xu, Haitao [1 ]
Liu, Yiwei [2 ]
Yin, Peng [2 ,3 ]
机构
[1] Zhejiang Univ, Sch Cyber & Technol, Hangzhou 310027, Peoples R China
[2] Def Ind Secrecy Examinat & Certificat Ctr, Beijing 100089, Peoples R China
[3] Univ Chinese Acad Sci, Sch Cyber Secur, Beijing 101408, Peoples R China
来源
APPLIED SCIENCES-BASEL | 2023年 / 13卷 / 13期
基金
中国国家自然科学基金;
关键词
phishing attack detection; multi-view learning; transformer; self-supervised learning; MALICIOUS URL; MODEL;
D O I
10.3390/app13137429
中图分类号
O6 [化学];
学科分类号
0703 ;
摘要
Phishing poses a significant threat to the financial and privacy security of internet users and often serves as the starting point for cyberattacks. Many machine-learning-based methods for detecting phishing websites rely on URL analysis, offering simplicity and efficiency. However, these approaches are not always effective due to the following reasons: (1) highly concealed phishing websites may employ tactics such as masquerading URL addresses to deceive machine learning models, and (2) phishing attackers frequently change their phishing website URLs to evade detection. In this study, we propose a robust, multi-view Transformer model with an expert-mixture mechanism for accurate phishing website detection utilizing website URLs, attributes, content, and behavioral information. Specifically, we first adapted a pretrained language model for URL representation learning by applying adversarial post-training learning in order to extract semantic information from URLs. Next, we captured the attribute, content, and behavioral features of the websites and encoded them as vectors, which, alongside the URL embeddings, constitute the website's multi-view information. Subsequently, we introduced a mixture-of-experts mechanism into the Transformer network to learn knowledge from different views and adaptively fuse information from various views. The proposed method outperforms state-of-the-art approaches in evaluations of real phishing websites, demonstrating greater performance with less label dependency. Furthermore, we show the superior robustness and enhanced adaptability of the proposed method to unseen samples and data drift in more challenging experimental settings.
引用
收藏
页数:17
相关论文
共 49 条
  • [11] Bengio Yoshua, 2013, Statistical Language and Speech Processing. First International Conference, SLSP 2013. Proceedings: LNCS 7978, P1, DOI 10.1007/978-3-642-39593-2_1
  • [12] Blum A., 2010, P 3 ACM WORKSH ART I, P54, DOI [10.1145/1866423.1866434, DOI 10.1145/1866423.1866434]
  • [13] URL-based Web Tracking Detection Using Deep Learning
    Castell-Uroz, Ismael
    Poissonnier, Theo
    Manneback, Pierre
    Barlet-Ros, Pere
    [J]. 2020 16TH INTERNATIONAL CONFERENCE ON NETWORK AND SERVICE MANAGEMENT (CNSM), 2020,
  • [14] A Deep Learning Method for Lightweight and Cross-Device IoT Botnet Detection
    Catillo, Marta
    Pecchia, Antonio
    Villano, Umberto
    [J]. APPLIED SCIENCES-BASEL, 2023, 13 (02):
  • [15] Xuan CD, 2020, INT J ADV COMPUT SC, V11, P148
  • [16] Tracking Phishing Attacks Over Time
    Cui, Qian
    Jourdan, Guy-Vincent
    Bochmann, Gregor, V
    Couturier, Russell
    Onut, Iosif-Viorel
    [J]. PROCEEDINGS OF THE 26TH INTERNATIONAL CONFERENCE ON WORLD WIDE WEB (WWW'17), 2017, : 667 - 676
  • [17] Devlin J, 2019, Arxiv, DOI arXiv:1810.04805
  • [18] Du CN, 2020, 58TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2020), P4019
  • [19] Mobile phishing attacks and defence mechanisms: State of art and open research challenges
    Goel, Diksha
    Jain, Ankit Kumar
    [J]. COMPUTERS & SECURITY, 2018, 73 : 519 - 544
  • [20] A Secure Intrusion Detection Platform Using Blockchain and Radial Basis Function Neural Networks for Internet of Drones
    Heidari, Arash
    Navimipour, Nima Jafari
    Unal, Mehmet
    [J]. IEEE INTERNET OF THINGS JOURNAL, 2023, 10 (10) : 8445 - 8454