Mal-ID: Automatic Malware Detection Using Common Segment Analysis and Meta-Features

被引：0

作者：

Tahan, Gil ^{[1
]}

Rokach, Lior ^{[1
]}

Shahar, Yuval ^{[1
]}

机构：

[1] Ben Gurion Univ Negev, Dept Informat Syst Engn, IL-84105 Beer Sheva, Israel

来源：

JOURNAL OF MACHINE LEARNING RESEARCH | 2012年 / 13卷

关键词：

computer security; malware detection; common segment analysis; supervised learning; CLASSIFICATION;

D O I：

暂无

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

This paper proposes several novel methods, based on machine learning, to detect malware in executable files without any need for preprocessing, such as unpacking or disassembling. The basic method (Mal-ID) is a new static (form-based) analysis methodology that uses common segment analysis in order to detect malware files. By using common segment analysis, Mal-ID is able to discard malware parts that originate from benign code. In addition, Mal-ID uses a new kind of feature, termed meta-feature, to better capture the properties of the analyzed segments. Rather than using the entire file, as is usually the case with machine learning based techniques, the new approach detects malware on the segment level. This study also introduces two Mal-ID extensions that improve the Mal-ID basic method in various aspects. We rigorously evaluated Mal-ID and its two extensions with more than ten performance measures, and compared them to the highly rated boosted decision tree method under identical settings. The evaluation demonstrated that Mal-ID and the two Mal-ID extensions outperformed the boosted decision tree method in almost all respects. In addition, the results indicated that by extracting meaningful features, it is sufficient to employ one simple detection rule for classifying executable files.

引用

页码：949 / 979

页数：31

共 41 条

[1] Abou-Assaleh T, 2004, P INT COMP SOFTW APP, P41
[2] AHA DW, 1991, MACH LEARN, V6, P37, DOI 10.1007/BF00153759
[3] [Anonymous], 2004, Mach. Learn.
[4] Bellman R., 1966, Adaptive Control Processes
[5] Bishop CM., 1995, NEURAL NETWORKS PATT
[6] Random forests
Breiman, L
[J]. MACHINE LEARNING, 2001, 45 (01) : 5 - 32
[7] Caruana R., 2004, ACM SIGKDD Explorations Newsletter, V6, P95, DOI [DOI 10.1145/1046456.1046470, 10.1145/1046456.1046470]
[8] Dai JY, 2009, J COMPUT, V4, P405
[9] Demiroz G, 1997, LECT NOTES ARTIF INT, V1224, P85
[10] Elovici Y, 2007, LECT NOTES ARTIF INT, V4667, P44

← 1 2 3 4 5 →