Inline Detection of Domain Generation Algorithms with Context-Sensitive Word Embeddings

被引:0
|
作者
Koh, Joewie J. [1 ,2 ]
Rhodes, Barton [1 ]
机构
[1] Optfit LLC, Denver, CO 80209 USA
[2] Georgia Inst Technol, Atlanta, GA 30332 USA
来源
2018 IEEE INTERNATIONAL CONFERENCE ON BIG DATA (BIG DATA) | 2018年
关键词
cybersecurity; domain generation algorithm; malware; transfer learning; word embedding;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Domain generation algorithms (DGAs) are frequently employed by malware to generate domains used for connecting to command-and-control (C2) servers. Recent work in DGA detection leveraged deep learning architectures like convolutional neural networks (CNNs) and character-level long short-term memory networks (LSTMs) to classify domains. However, these classifiers perform poorly with wordlist-based DGA families, which generate domains by pseudorandomly concatenating dictionary words. We propose a novel approach that combines context-sensitive word embeddings with a simple fully-connected classifier to perform classification of domains based on word-level information. The word embeddings were pre-trained on a large unrelated corpus and left frozen during the training on domain data. The resulting small number of trainable parameters enabled extremely short training durations, while the transfer of language knowledge stored in the representations allowed for high-performing models with small training datasets. We show that this architecture reliably outperformed existing techniques on wordlist-based DGA families with just 30 DGA training examples and achieved state-of-the-art performance with around 100 DGA training examples, all while requiring an order of magnitude less time to train compared to current techniques. Of special note is the technique's performance on the matsnu DGA:the classifier attained a 89.5% detection rate with a 1:1,000 false positive rate (FPR) after training on only 30 examples of the DGA domains, and a 91.2% detection rate with a 1:10,000 FPR after 90 examples. Considering that some of these DGAs have wordlists of several hundred words, our results demonstrate that this technique does not rely on the classifier learning the DGA wordlists. Instead, the classifier is able to learn the semantic signatures of the wordlist-based DGA families.
引用
收藏
页码:2966 / 2971
页数:6
相关论文
共 50 条
  • [41] THE DETECTION OF LEXICAL AMBIGUITY - EVIDENCE FOR CONTEXT-SENSITIVE PARALLEL ACCESS
    NEILL, WT
    HILLIARD, DV
    COOPER, EA
    JOURNAL OF MEMORY AND LANGUAGE, 1988, 27 (03) : 279 - 287
  • [42] Conversational Context-sensitive Ad Generation with a Few Core-Queries
    Shibata, Ryoichi
    Matsumori, Shoya
    Fukuchi, Yosuke
    Maekawa, Tomoyuki
    Kimoto, Mitsuhiko
    Imai, Michita
    ACM TRANSACTIONS ON INTERACTIVE INTELLIGENT SYSTEMS, 2023, 13 (03)
  • [43] Alphabet-independent algorithms for finding context-sensitive repeats in linear time
    Ohlebusch, Enno
    Beller, Timo
    JOURNAL OF DISCRETE ALGORITHMS, 2015, 34 : 23 - 36
  • [44] Alphabet-Independent Algorithms for Finding Context-Sensitive Repeats in Linear Time
    Ohlebusch, Enno
    Beller, Timo
    STRING PROCESSING AND INFORMATION RETRIEVAL, SPIRE 2014, 2014, 8799 : 117 - 128
  • [45] Learning Context-Sensitive Domain Ontologies from Folksonomies: A Cognitively Motivated Method
    Lau, Raymond Y. K.
    Zhao, J. Leon
    Zhang, Wenping
    Cai, Yi
    Ngai, Eric W. T.
    INFORMS JOURNAL ON COMPUTING, 2015, 27 (03) : 561 - 578
  • [46] The small DNA binding domain of λ integrase is a context-sensitive modulator of recombinase functions
    Sarkar, D
    Radman-Livaja, M
    Landy, A
    EMBO JOURNAL, 2001, 20 (05): : 1203 - 1212
  • [47] Database Traffic Interception for Graybox Detection of Stored and Context-sensitive XSS
    Steinhauser, Antonin
    Tuma, Petr
    DIGITAL THREATS: RESEARCH AND PRACTICE, 2020, 1 (03):
  • [48] A Semisupervised Context-Sensitive Change Detection Technique via Gaussian Process
    Chen, Keming
    Zhou, Zhixin
    Huo, Chunlei
    Sun, Xian
    Fu, Kun
    IEEE GEOSCIENCE AND REMOTE SENSING LETTERS, 2013, 10 (02) : 236 - 240
  • [49] Dynamic Context-Sensitive Filtering Network for Video Salient Object Detection
    Zhang, Miao
    Liu, Jie
    Wang, Yifei
    Piao, Yongri
    Yao, Shunyu
    Ji, Wei
    Li, Jingjing
    Lu, Huchuan
    Luo, Zhongxuan
    2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2021), 2021, : 1533 - 1543
  • [50] Exploring context-sensitive data flow analysis for early vulnerability detection
    Sampaio, Luciano
    Garcia, Alessandro
    JOURNAL OF SYSTEMS AND SOFTWARE, 2016, 113 : 337 - 361