Malware classification based on call graph clustering

被引:112
作者
Kinable, Joris [1 ,2 ]
Kostakis, Orestis [1 ]
机构
[1] Aalto Univ, Helsinki Inst Informat Technol, Dept Informat & Comp Sci, POB 15400, Aalto 00076, Finland
[2] Katholieke Univ Leuven, Dept Comp Sci, B-8500 Kortrijk, Belgium
来源
JOURNAL OF COMPUTER VIROLOGY AND HACKING TECHNIQUES | 2011年 / 7卷 / 04期
关键词
D O I
10.1007/s11416-011-0151-y
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Each day, anti-virus companies receive tens of thousands samples of potentially harmful executables. Many of themalicious samples are variations of previously encountered malware, created by their authors to evade patternbased detection. Dealing with these large amounts of data requires robust, automatic detection approaches. This paper studies malware classification based on call graph clustering. By representingmalware samples as call graphs, it is possible to abstract certain variations away, enabling the detection of structural similarities between samples. The ability to cluster similar samples together will make more generic detection techniques possible, thereby targeting the commonalities of the samples within a cluster. To compare call graphs mutually, we compute pairwise graph similarity scores via graph matchings which approximately minimize the graph edit distance. Next, to facilitate the discovery of similar malware samples, we employ several clustering algorithms, including k-medoids and Density-Based Spatial Clustering of Applications with Noise (DBSCAN). Clustering experiments are conducted on a collection of real malware samples, and the results are evaluated against manual classifications provided by human malware analysts. Experiments show that it is indeed possible to accurately detect malware families via call graph clustering. We anticipate that in the future, call graphs can be used to analyse the emergence of new malware families, and ultimately to automate implementation of generic detection schemes.
引用
收藏
页码:233 / 245
页数:13
相关论文
共 45 条
  • [1] Arthur D, 2007, PROCEEDINGS OF THE EIGHTEENTH ANNUAL ACM-SIAM SYMPOSIUM ON DISCRETE ALGORITHMS, P1027
  • [2] Bailey M, 2007, LECT NOTES COMPUT SC, V4637, P178
  • [3] Bayer U., 2009, 16 ANN NETW DISTR SY
  • [4] Bayer U., 2010, SAC 10, P1871, DOI [10.1145/1774088.1774484, DOI 10.1145/1774088.1774484]
  • [5] Bayer U, 2009, THESIS
  • [6] Opcodes as predictor for malware
    Bilar, Daniel
    [J]. INTERNATIONAL JOURNAL OF ELECTRONIC SECURITY AND DIGITAL FORENSICS, 2007, 1 (02) : 156 - 168
  • [7] Code obfuscation techniques for metamorphic viruses
    Borello, Jean-Marie
    Me, Ludovic
    [J]. JOURNAL OF COMPUTER VIROLOGY AND HACKING TECHNIQUES, 2008, 4 (03): : 211 - 220
  • [8] Bradde S., 2009, ALIGNING GRAPHS FIND
  • [9] Briones I., 2008, P 2008 VIR B C
  • [10] Code normalization for self-mutating malware
    Bruschi, Danilo
    Martignoni, Lorenzo
    Monga, Mattia
    [J]. IEEE SECURITY & PRIVACY, 2007, 5 (02) : 46 - 54