Hidden Markov models for malware classification

被引:58
作者
Annachhatre, Chinmayee [1 ]
Austin, Thomas H. [1 ]
Stamp, Mark [1 ]
机构
[1] San Jose State Univ, Dept Comp Sci, San Jose, CA 95192 USA
来源
JOURNAL OF COMPUTER VIROLOGY AND HACKING TECHNIQUES | 2015年 / 11卷 / 02期
关键词
D O I
10.1007/s11416-014-0215-x
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Previous research has shown that hidden Markov model (HMM) analysis is useful for detecting certain challenging classes of malware. In this research, we consider the related problem of malware classification based on HMMs. We train multiple HMMs on a variety of compilers and malware generators. More than 8,000 malware samples are then scored against these models and separated into clusters based on the resulting scores. We observe that the clustering results could be used to classify the malware samples into their appropriate families with good accuracy. Since none of the malware families in the test set were used to generate the HMMs, these results indicate that our approach can effective classify previously unknown malware, at least in some cases. Thus, such a clustering strategy could serve as a useful tool in malware analysis and classification.
引用
收藏
页码:59 / 73
页数:15
相关论文
共 34 条
  • [1] Annachhatre C., 2013, HIDDEN MARKOV MODELS
  • [2] [Anonymous], 2012, VIRUS REMOVAL SERVIC
  • [3] [Anonymous], 2013, MALW ATTR EN CHAR
  • [4] Profile hidden Markov models and metamorphic virus detection
    Attaluri, Srilatha
    McGhee, Scott
    Stamp, Mark
    [J]. JOURNAL IN COMPUTER VIROLOGY AND HACKING TECHNIQUES, 2009, 5 (02): : 151 - 169
  • [5] Exploring Hidden Markov Models for Virus Analysis: A Semantic Approach
    Austin, Thomas H.
    Filiol, Eric
    Josse, Sebastien
    Stamp, Mark
    [J]. PROCEEDINGS OF THE 46TH ANNUAL HAWAII INTERNATIONAL CONFERENCE ON SYSTEM SCIENCES, 2013, : 5039 - 5048
  • [6] Structural entropy and metamorphic malware
    Baysa, Donabelle
    Low, Richard M.
    Stamp, Mark
    [J]. JOURNAL OF COMPUTER VIROLOGY AND HACKING TECHNIQUES, 2013, 9 (04): : 179 - 192
  • [7] The use of the area under the roc curve in the evaluation of machine learning algorithms
    Bradley, AP
    [J]. PATTERN RECOGNITION, 1997, 30 (07) : 1145 - 1159
  • [8] Canzanese R., 2013, AUTOMATIC ONLINE BEH
  • [9] Cesare S, 2010, P 8 AUSTR S PAR DIST, P61
  • [10] What is the expectation maximization algorithm?
    Do, Chuong B.
    Batzoglou, Serafim
    [J]. NATURE BIOTECHNOLOGY, 2008, 26 (08) : 897 - 899