Malware Classification with Word Embedding Features

被引:1
作者
Kale, Aparna Sunil [1 ]
Di Troia, Fabio [1 ]
Stamp, Mark [1 ]
机构
[1] San Jose State Univ, Dept Comp Sci, San Jose, CA 95192 USA
来源
ICISSP: PROCEEDINGS OF THE 7TH INTERNATIONAL CONFERENCE ON INFORMATION SYSTEMS SECURITY AND PRIVACY | 2021年
关键词
Malware; Machine Learning; Word2Vec; HMM2Vec; CNN; HIDDEN MARKOV-MODELS;
D O I
10.5220/0010377907330742
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Malware classification is an important and challenging problem in information security. Modern malware classification techniques rely on machine learning models that can be trained on features such as opcode sequences, API calls, and byte n-grams, among many others. In this research, we consider opcode features. We implement hybrid machine learning techniques, where we engineer feature vectors by training hidden Markov models-a technique that we refer to as HMM2Vec-and Word2Vec embeddings on these opcode sequences. The resulting HMM2Vec and Word2Vec embedding vectors are then used as features for classification algorithms. Specifically, we consider support vector machine (SVM), k-nearest neighbor (k-NN), random forest (RF), and convolutional neural network (CNN) classifiers. We conduct substantial experiments over a variety of malware families. Our experiments extend well beyond any previous related work in this field.
引用
收藏
页码:733 / 742
页数:10
相关论文
共 26 条
[1]  
Aycock J., 2006, Computer Viruses and Malware, V22
[2]  
Chandak A., 2021, MALWARE ANAL USING A
[3]  
Chollet Francois, 2023, Keras: Deep learning for humans
[4]  
CORTES C, 1995, MACH LEARN, V20, P273, DOI 10.1023/A:1022627411411
[5]  
Dhanasekar D., 2018, Guide to Vulnerability Analysis for Computer Networks and Systems, P281, DOI DOI 10.1007/978-3-319-92624-7_12
[6]  
Gael V., 2014, HMMLEARN
[7]  
Kim, 2018, PE HEADER ANAL MALWA
[8]  
Kolter JZ, 2006, J MACH LEARN RES, V7, P2721
[9]   HIDDEN MARKOV-MODELS IN COMPUTATIONAL BIOLOGY - APPLICATIONS TO PROTEIN MODELING [J].
KROGH, A ;
BROWN, M ;
MIAN, IS ;
SJOLANDER, K ;
HAUSSLER, D .
JOURNAL OF MOLECULAR BIOLOGY, 1994, 235 (05) :1501-1531
[10]  
Lo W.W., 2019, 2019 10 IFIP INT C N, P1, DOI [DOI 10.1109/NTMS.2019.8763852, 10.1109/NTMS.2019.8763852]