Inline Detection of Domain Generation Algorithms with Context-Sensitive Word Embeddings

被引:0
|
作者
Koh, Joewie J. [1 ,2 ]
Rhodes, Barton [1 ]
机构
[1] Optfit LLC, Denver, CO 80209 USA
[2] Georgia Inst Technol, Atlanta, GA 30332 USA
来源
2018 IEEE INTERNATIONAL CONFERENCE ON BIG DATA (BIG DATA) | 2018年
关键词
cybersecurity; domain generation algorithm; malware; transfer learning; word embedding;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Domain generation algorithms (DGAs) are frequently employed by malware to generate domains used for connecting to command-and-control (C2) servers. Recent work in DGA detection leveraged deep learning architectures like convolutional neural networks (CNNs) and character-level long short-term memory networks (LSTMs) to classify domains. However, these classifiers perform poorly with wordlist-based DGA families, which generate domains by pseudorandomly concatenating dictionary words. We propose a novel approach that combines context-sensitive word embeddings with a simple fully-connected classifier to perform classification of domains based on word-level information. The word embeddings were pre-trained on a large unrelated corpus and left frozen during the training on domain data. The resulting small number of trainable parameters enabled extremely short training durations, while the transfer of language knowledge stored in the representations allowed for high-performing models with small training datasets. We show that this architecture reliably outperformed existing techniques on wordlist-based DGA families with just 30 DGA training examples and achieved state-of-the-art performance with around 100 DGA training examples, all while requiring an order of magnitude less time to train compared to current techniques. Of special note is the technique's performance on the matsnu DGA:the classifier attained a 89.5% detection rate with a 1:1,000 false positive rate (FPR) after training on only 30 examples of the DGA domains, and a 91.2% detection rate with a 1:10,000 FPR after 90 examples. Considering that some of these DGAs have wordlists of several hundred words, our results demonstrate that this technique does not rely on the classifier learning the DGA wordlists. Instead, the classifier is able to learn the semantic signatures of the wordlist-based DGA families.
引用
收藏
页码:2966 / 2971
页数:6
相关论文
共 50 条
  • [1] Learning Context-Sensitive Word Embeddings with Neural Tensor Skip-Gram Model
    Liu, Pengfei
    Qiu, Xipeng
    Huang, Xuanjing
    PROCEEDINGS OF THE TWENTY-FOURTH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE (IJCAI), 2015, : 1284 - 1290
  • [2] CONTEXT-SENSITIVE REPRESENTATION OF WORD MEANINGS
    HALFF, HM
    ORTONY, A
    ANDERSON, RC
    MEMORY & COGNITION, 1976, 4 (04) : 378 - 383
  • [3] Context-sensitive normalization of social media text in bahasa Indonesia based on neural word embeddings
    Kusumawardani, Renny Pradina
    Priansya, Stezar
    Atletiko, Faizal Johan
    INNS CONFERENCE ON BIG DATA AND DEEP LEARNING, 2018, 144 : 105 - 117
  • [4] Context-Sensitive, Distributed, Multi-Domain Adaptive Option Generation
    Schneider, M. K.
    Barbulescu, L.
    Batlle-Rafferty, L.
    Cook, M.
    Kapler, T.
    Loppie, M.
    Pelletier, E.
    Rubinstein, Z.
    Sexton, III
    Smith, S.
    Javorsek, D.
    ARTIFICIAL INTELLIGENCE AND MACHINE LEARNING FOR MULTI-DOMAIN OPERATIONS APPLICATIONS III, 2021, 11746
  • [5] CSOD: Context-Sensitive Overflow Detection
    Liu, Hongyu
    Silvestro, Sam
    Wang, Xiaoyin
    Duan, Lide
    Liu, Tongping
    PROCEEDINGS OF THE 2019 IEEE/ACM INTERNATIONAL SYMPOSIUM ON CODE GENERATION AND OPTIMIZATION (CGO '19), 2019, : 50 - 60
  • [6] Language Models as Context-sensitive Word Search Engines
    Wiegmann, Matti
    Voelske, Michael
    Stein, Benno
    Potthast, Martin
    PROCEEDINGS OF THE FIRST WORKSHOP ON INTELLIGENT AND INTERACTIVE WRITING ASSISTANTS (IN2WRITING 2022), 2022, : 39 - 45
  • [7] Context-sensitive rules and word naming in Italian children
    Laura Barca
    Andrew W. Ellis
    Cristina Burani
    Reading and Writing, 2007, 20 : 495 - 509
  • [8] Context-sensitive rules and word naming in Italian children
    Barca, Laura
    Ellis, Andrew W.
    Burani, Cristina
    READING AND WRITING, 2007, 20 (05) : 495 - 509
  • [9] CONTEXT-SENSITIVE HELP AND TWRP, THE TINY WORD PROCESSOR
    STEVENS, A
    DR DOBBS JOURNAL, 1988, 13 (12): : 81 - &
  • [10] Algorithms for Inferring Context-Sensitive L-Systems
    McQuillan, Ian
    Bernard, Jason
    Prusinkiewicz, Przemyslaw
    UNCONVENTIONAL COMPUTATION AND NATURAL COMPUTATION, UCNC 2018, 2018, 10867 : 117 - 130