Weakly Supervised Deep Learning for the Detection of Domain Generation Algorithms

被引:24
|
作者
Yu, Bin [1 ,2 ]
Pan, Jie [3 ]
Gray, Daniel [3 ]
Hu, Jiaming [3 ]
Choudhary, Chhaya [3 ]
Nascimento, Anderson C. A. [3 ]
De Cock, Martine [3 ,4 ]
机构
[1] Infoblox, Santa Clara, CA 95054 USA
[2] Infoblox, Tacoma, WA 98402 USA
[3] Univ Washington, Sch Engn & Technol, Tacoma, WA 98402 USA
[4] Univ Ghent, Dept Appl Math Comp Sci & Stat, B-9000 Ghent, Belgium
来源
IEEE ACCESS | 2019年 / 7卷
关键词
Deep learning; random forest; text classification; heuristically labeled data; domain generation algorithms; cybersecurity; command and control;
D O I
10.1109/ACCESS.2019.2911522
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Domain generation algorithms (DGAs) have become commonplace in malware that seeks to establish command and control communication between an infected machine and the botmaster. DGAs dynamically and consistently generate large volumes of malicious domain names, only a few of which are registered by the botmaster, within a short time window around their generation time, and subsequently resolved when the malware on the infected machine tries to access them. Deep neural networks that can classify domain names as benign or malicious are of great interest in the real-time defense against DGAs. In contrast with traditional machine learning models, deep networks do not rely on human engineered features. Instead, they can learn features automatically from data, provided that they are supplied with sufficiently large amounts of suitable training data. Obtaining cleanly labeled ground truth data is difficult and time consuming. Heuristically labeled data could potentially provide a source of training data for weakly supervised training of DGA detectors. We propose a set of heuristics for automatically labeling domain names monitored in real traffic, and then train and evaluate classifiers with the proposed heuristically labeled dataset. We show through experiments on a dataset with 50 million domain names that such heuristically labeled data is very useful in practice to improve the predictive accuracy of deep learning-based DGA classifiers, and that these deep neural networks significantly outperform a random forest classifier with human engineered features.
引用
收藏
页码:51542 / 51556
页数:15
相关论文
共 50 条
  • [21] Domain generated algorithms detection applying a combination of a deep feature selection and traditional machine learning models
    Hassaoui, Mohamed
    Hanini, Mohamed
    El Kafhali, Said
    JOURNAL OF COMPUTER SECURITY, 2023, 31 (01) : 85 - 105
  • [22] DEEP SEMI-SUPERVISED LEARNING FOR DOMAIN ADAPTATION
    Chen, Hung-Yu
    Chien, Jen-Tzung
    2015 IEEE INTERNATIONAL WORKSHOP ON MACHINE LEARNING FOR SIGNAL PROCESSING, 2015,
  • [23] Weakly Supervised Deep Learning-based Intracranial Hemorrhage Localization
    Nemcek, Jakub
    Vicar, Tomas
    Jakubicek, Roman
    PROCEEDINGS OF THE 15TH INTERNATIONAL JOINT CONFERENCE ON BIOMEDICAL ENGINEERING SYSTEMS AND TECHNOLOGIES (BIOIMAGING), VOL 2, 2021, : 111 - 116
  • [24] Weakly Supervised Deep Learning for Tooth-Marked Tongue Recognition
    Zhou, Jianguo
    Li, Shangxuan
    Wang, Xuesong
    Yang, Zizhu
    Hou, Xinyuan
    Lai, Wei
    Zhao, Shifeng
    Deng, Qingqiong
    Zhou, Wu
    FRONTIERS IN PHYSIOLOGY, 2022, 13
  • [25] A Weakly Supervised Graph Deep Learning Framework for Point Cloud Registration
    Sun, Lan
    Zhang, Zhenxin
    Zhong, Ruofei
    Chen, Dong
    Zhang, Liqiang
    Zhu, Lin
    Wang, Qiang
    Wang, Guo
    Zou, Jianjun
    Wang, Yu
    IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2022, 60
  • [26] Weakly supervised detection with decoupled attention-based deep representation
    Wenhui Jiang
    Zhicheng Zhao
    Fei Su
    Multimedia Tools and Applications, 2018, 77 : 3261 - 3277
  • [27] Systematic comparison of deep learning strategies for weakly supervised Gleason grading
    Otalora, Sebastian
    Atzori, Manfredo
    Khan, Amjad
    Jimenez-del-Toro, Oscar
    Andrearczyk, Vincent
    Mueller, Henning
    MEDICAL IMAGING 2020: DIGITAL PATHOLOGY, 2021, 11320
  • [28] Predicting Microblog Sentiments via Weakly Supervised Multimodal Deep Learning
    Chen, Fuhai
    Ji, Rongrong
    Su, Jinsong
    Cao, Donglin
    Gao, Yue
    IEEE TRANSACTIONS ON MULTIMEDIA, 2018, 20 (04) : 997 - 1007
  • [29] Weakly supervised detection with decoupled attention-based deep representation
    Jiang, Wenhui
    Zhao, Zhicheng
    Su, Fei
    MULTIMEDIA TOOLS AND APPLICATIONS, 2018, 77 (03) : 3261 - 3277
  • [30] Deep Learning Models Based on Weakly Supervised Learning and Clustering Visualization for Disease Diagnosis
    Liu, Jingyao
    Feng, Qinghe
    Zhao, Jiashi
    Miao, Yu
    He, Wei
    Shi, Weili
    Jiang, Zhengang
    CMC-COMPUTERS MATERIALS & CONTINUA, 2023, 76 (03): : 2649 - 2665