Weakly Supervised Deep Learning for the Detection of Domain Generation Algorithms

被引:24
|
作者
Yu, Bin [1 ,2 ]
Pan, Jie [3 ]
Gray, Daniel [3 ]
Hu, Jiaming [3 ]
Choudhary, Chhaya [3 ]
Nascimento, Anderson C. A. [3 ]
De Cock, Martine [3 ,4 ]
机构
[1] Infoblox, Santa Clara, CA 95054 USA
[2] Infoblox, Tacoma, WA 98402 USA
[3] Univ Washington, Sch Engn & Technol, Tacoma, WA 98402 USA
[4] Univ Ghent, Dept Appl Math Comp Sci & Stat, B-9000 Ghent, Belgium
来源
IEEE ACCESS | 2019年 / 7卷
关键词
Deep learning; random forest; text classification; heuristically labeled data; domain generation algorithms; cybersecurity; command and control;
D O I
10.1109/ACCESS.2019.2911522
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Domain generation algorithms (DGAs) have become commonplace in malware that seeks to establish command and control communication between an infected machine and the botmaster. DGAs dynamically and consistently generate large volumes of malicious domain names, only a few of which are registered by the botmaster, within a short time window around their generation time, and subsequently resolved when the malware on the infected machine tries to access them. Deep neural networks that can classify domain names as benign or malicious are of great interest in the real-time defense against DGAs. In contrast with traditional machine learning models, deep networks do not rely on human engineered features. Instead, they can learn features automatically from data, provided that they are supplied with sufficiently large amounts of suitable training data. Obtaining cleanly labeled ground truth data is difficult and time consuming. Heuristically labeled data could potentially provide a source of training data for weakly supervised training of DGA detectors. We propose a set of heuristics for automatically labeling domain names monitored in real traffic, and then train and evaluate classifiers with the proposed heuristically labeled dataset. We show through experiments on a dataset with 50 million domain names that such heuristically labeled data is very useful in practice to improve the predictive accuracy of deep learning-based DGA classifiers, and that these deep neural networks significantly outperform a random forest classifier with human engineered features.
引用
收藏
页码:51542 / 51556
页数:15
相关论文
共 50 条
  • [31] Weakly supervised machine learning
    Ren, Zeyu
    Wang, Shuihua
    Zhang, Yudong
    CAAI TRANSACTIONS ON INTELLIGENCE TECHNOLOGY, 2023, 8 (03) : 549 - 580
  • [32] Investigation of pavement crack detection based on deep learning method using weakly supervised instance segmentation framework
    Zhang, Hancheng
    Qian, Zhendong
    Tan, Yunfeng
    Xie, Yuxin
    Li, Miaocheng
    CONSTRUCTION AND BUILDING MATERIALS, 2022, 358
  • [33] Collaborative Normality Learning Framework for Weakly Supervised Video Anomaly Detection
    Liu, Yang
    Liu, Jing
    Zhao, Mengyang
    Li, Shuang
    Song, Liang
    IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS II-EXPRESS BRIEFS, 2022, 69 (05) : 2508 - 2512
  • [34] Diagnosing Rotating Machines With Weakly Supervised Data Using Deep Transfer Learning
    Li, Xiang
    Zhang, Wei
    Ding, Qian
    Li, Xu
    IEEE TRANSACTIONS ON INDUSTRIAL INFORMATICS, 2020, 16 (03) : 1688 - 1697
  • [35] A method for spatial interpretation of weakly supervised deep learning models in computational pathology
    Abhinav Sharma
    Bojing Liu
    Mattias Rantalainen
    Scientific Reports, 15 (1)
  • [36] Weakly-supervised deep learning for breast tumor segmentation in ultrasound images
    Li, Yongshuai
    Liu, Yuan
    Wang, Zhili
    Luo, Jianwen
    INTERNATIONAL ULTRASONICS SYMPOSIUM (IEEE IUS 2021), 2021,
  • [37] Weakly Supervised Deep Learning for Whole Slide Lung Cancer Image Analysis
    Wang, Xi
    Chen, Hao
    Gan, Caixia
    Lin, Huangjing
    Dou, Qi
    Tsougenis, Efstratios
    Huang, Qitao
    Cai, Muyan
    Heng, Pheng-Ann
    IEEE TRANSACTIONS ON CYBERNETICS, 2020, 50 (09) : 3950 - 3962
  • [38] Weakly supervised deep learning for diagnosis of multiple vertebral compression fractures in CT
    Choi, Euijoon
    Park, Doohyun
    Son, Geonhui
    Bak, Seongwon
    Eo, Taejoon
    Youn, Daemyung
    Hwang, Dosik
    EUROPEAN RADIOLOGY, 2024, 34 (02) : 1346 - 1348
  • [39] Weakly supervised deep learning for diagnosis of multiple vertebral compression fractures in CT
    Choi, Euijoon
    Park, Doohyun
    Son, Geonhui
    Bak, Seongwon
    Eo, Taejoon
    Youn, Daemyung
    Hwang, Dosik
    EUROPEAN RADIOLOGY, 2024, 34 (06) : 3750 - 3760
  • [40] WDLS: Deep Level Set Learning for Weakly Supervised Aeroengine Defect Segmentation
    Qi, Haochen
    Cheng, Liu
    Kong, Xiangwei
    Zhang, Jiqiang
    Gu, Jianyi
    IEEE TRANSACTIONS ON INDUSTRIAL INFORMATICS, 2024, 20 (01) : 303 - 313