Mal-ID: Automatic Malware Detection Using Common Segment Analysis and Meta-Features

被引:0
作者
Tahan, Gil [1 ]
Rokach, Lior [1 ]
Shahar, Yuval [1 ]
机构
[1] Ben Gurion Univ Negev, Dept Informat Syst Engn, IL-84105 Beer Sheva, Israel
关键词
computer security; malware detection; common segment analysis; supervised learning; CLASSIFICATION;
D O I
暂无
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
This paper proposes several novel methods, based on machine learning, to detect malware in executable files without any need for preprocessing, such as unpacking or disassembling. The basic method (Mal-ID) is a new static (form-based) analysis methodology that uses common segment analysis in order to detect malware files. By using common segment analysis, Mal-ID is able to discard malware parts that originate from benign code. In addition, Mal-ID uses a new kind of feature, termed meta-feature, to better capture the properties of the analyzed segments. Rather than using the entire file, as is usually the case with machine learning based techniques, the new approach detects malware on the segment level. This study also introduces two Mal-ID extensions that improve the Mal-ID basic method in various aspects. We rigorously evaluated Mal-ID and its two extensions with more than ten performance measures, and compared them to the highly rated boosted decision tree method under identical settings. The evaluation demonstrated that Mal-ID and the two Mal-ID extensions outperformed the boosted decision tree method in almost all respects. In addition, the results indicated that by extracting meaningful features, it is sufficient to employ one simple detection rule for classifying executable files.
引用
收藏
页码:949 / 979
页数:31
相关论文
共 41 条
  • [11] FAYYAD UM, 1993, IJCAI-93, VOLS 1 AND 2, P1022
  • [12] Filiol E, 2006, J COMPUT VIROL HACKI, V2, P35, DOI 10.1007/s11416-006-0009-x
  • [13] Franc V, 2009, J MACH LEARN RES, V10, P2157
  • [14] Molecular classification of cancer: Class discovery and class prediction by gene expression monitoring
    Golub, TR
    Slonim, DK
    Tamayo, P
    Huard, C
    Gaasenbeek, M
    Mesirov, JP
    Coller, H
    Loh, ML
    Downing, JR
    Caligiuri, MA
    Bloomfield, CD
    Lander, ES
    [J]. SCIENCE, 1999, 286 (5439) : 531 - 537
  • [15] Henchiri O, 2006, IEEE DATA MINING, P891
  • [16] VERY SIMPLE CLASSIFICATION RULES PERFORM WELL ON MOST COMMONLY USED DATASETS
    HOLTE, RC
    [J]. MACHINE LEARNING, 1993, 11 (01) : 63 - 91
  • [17] JOACHIMS T, 1999, ADV KERNEL METHODS S, P169, DOI DOI 10.17877/DE290R-5098
  • [18] John George H., 1995, ESTIMATING CONTINUOU, DOI DOI 10.1109/TGRS.2004.834800
  • [19] Kolter J. Z., 2004, KDD 2004, P470
  • [20] Kolter JZ, 2006, J MACH LEARN RES, V7, P2721