Evolutionary Cost-Sensitive Ensemble for Malware Detection

被引:4
作者
Krawczyk, Bartosz [1 ]
Wozniak, Michal [1 ]
机构
[1] Wroclaw Univ Technol, Dept Syst & Comp Networks, PL-50370 Wroclaw, Poland
来源
INTERNATIONAL JOINT CONFERENCE SOCO'14-CISIS'14-ICEUTE'14 | 2014年 / 299卷
关键词
machine learning; classifier ensemble; multiple classifier system; imbalanced classification; cost-sensitive; malware detection; IMBALANCED DATA; MINORITY CLASS; CLASSIFICATION;
D O I
10.1007/978-3-319-07995-0_43
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Malware detection is among the most extensively developed areas for computer security. Unauthorized, malicious software can cause expensive damage to both private users and companies. It can destroy the computer, breach the privacy of user and result in loss of valuable data. The amount of data uploaded and downloaded each day makes almost impossible for manual screening of each incoming software package. That is why there is a need for effective intelligent filters, that can automatically dichotomize between the safe and dangerous applications. The number of malware programs, that are faced by the detection system, is typically much smaller than the number of desired programs. Therefore, we have to deal with the imbalanced classification problem, in which standard classification algorithms tend to fail. In this paper, we present a novel ensemble, based on cost-sensitive decision trees. Individual classifiers are constructed according to an established cost matrix and trained on random feature subspaces to ensure, that they are mutually complementary. Instead of using a fixed cost matrix we derive its parameters via ROC analysis. An evolutionary algorithm is being applied for simultaneous classifier selection and assignment of committee member weights for the fusion process. Experimental analysis, carried out on a large malware dataset, prove that our method is capable of outperforming other state-of-the-art algorithms, and hence is an effective approach for the problem of imbalanced malware detection.
引用
收藏
页码:433 / 442
页数:10
相关论文
共 22 条
  • [1] An exemplar-based learning approach for detection and classification of malicious network streams in honeynets
    Abbasi, Fahim H.
    Harris, Richard
    Marsland, Stephen
    Moretti, Giovanni
    [J]. SECURITY AND COMMUNICATION NETWORKS, 2014, 7 (02) : 352 - 364
  • [2] Combined 5 x 2 cv F test for comparing supervised classification learning algorithms
    Alpaydin, E
    [J]. NEURAL COMPUTATION, 1999, 11 (08) : 1885 - 1892
  • [3] [Anonymous], 1984, OLSHEN STONE CLASSIF, DOI 10.2307/2530946
  • [4] Blaszczynski J, 2010, LECT NOTES ARTIF INT, V6086, P148, DOI 10.1007/978-3-642-13529-3_17
  • [5] SMOTEBoost: Improving prediction of the minority class in boosting
    Chawla, NV
    Lazarevic, A
    Hall, LO
    Bowyer, KW
    [J]. KNOWLEDGE DISCOVERY IN DATABASES: PKDD 2003, PROCEEDINGS, 2003, 2838 : 107 - 119
  • [6] An introduction to ROC analysis
    Fawcett, Tom
    [J]. PATTERN RECOGNITION LETTERS, 2006, 27 (08) : 861 - 874
  • [7] Ho TK, 1998, IEEE T PATTERN ANAL, V20, P832, DOI 10.1109/34.709601
  • [8] Krawczyk B., 2012, 2012 IEEE-EMBS International Conference on Biomedical and Health Informatics (BHI), P507, DOI 10.1109/BHI.2012.6211629
  • [9] Cost-sensitive decision tree ensembles for effective imbalanced classification
    Krawczyk, Bartosz
    Wozniak, Michal
    Schaefer, Gerald
    [J]. APPLIED SOFT COMPUTING, 2014, 14 : 554 - 562
  • [10] Ling C.X., 2004, ICML, P544