Evolving meta-ensemble of classifiers for handling incomplete and unbalanced datasets in the cyber security domain

被引:21
作者
Folino, G. [1 ]
Pisani, F. S. [1 ]
机构
[1] Inst High Performance Comp & Networking ICAR CNR, Via P Bucci, I-87036 Arcavacata Di Rende, CS, Italy
关键词
Ensemble; Data mining; Cyber security; Missing features;
D O I
10.1016/j.asoc.2016.05.044
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Cyber security classification algorithms usually operate with datasets presenting many missing features and strongly unbalanced classes. In order to cope with these issues, we designed a distributed genetic programming (GP) framework, named CAGE-MetaCombiner, which adopts a meta-ensemble model to operate efficiently with missing data. Each ensemble evolves a function for combining the classifiers, which does not need of any extra phase of training on the original data. Therefore, in the case of changes in the data, the function can be recomputed in an incremental way, with a moderate computational effort; this aspect together with the advantages of running on parallel/distributed architectures makes the algorithm suitable to operate with the real time constraints typical of a cyber security problem. In addition, an important cyber security problem that concerns the classification of the users or the employers of an e-payment system is illustrated, in order to show the relevance of the case in which entire sources of data or groups of features are missing. Finally, the capacity of approach in handling groups of missing features and unbalanced datasets is validated on many artificial datasets and on two real datasets and it is compared with some similar approaches. (C) 2016 Elsevier B.V. All rights reserved.
引用
收藏
页码:179 / 190
页数:12
相关论文
共 27 条
[1]  
Acosta-Mendoza N., 2014, IJPRAI, V28
[2]  
[Anonymous], 1986, STAT ANAL MISSING DA
[3]   Creating Evolving User Behavior Profiles Automatically [J].
Antonio Iglesias, Jose ;
Angelov, Plamen ;
Ledezma, Agapito ;
Sanchis, Araceli .
IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2012, 24 (05) :854-867
[4]  
Bahri E, 2011, LECT NOTES COMPUT SC, V6694, P17, DOI 10.1007/978-3-642-21323-6_3
[5]   Evolving Teams of Predictors with Linear Genetic Programming [J].
Markus Brameier ;
Wolfgang Banzhaf .
Genetic Programming and Evolvable Machines, 2001, 2 (4) :381-407
[6]  
Breiman L, 1996, MACH LEARN, V24, P123, DOI 10.1007/BF00058655
[7]  
CERT Australia, 2012, TECH REP
[8]  
Chawla NV, 2007, LECT NOTES COMPUT SC, V4472, P397
[9]  
de Oliveira D. F., 2009, INT JOINT C NEUR NET
[10]   Using Bayesian networks for selecting classifiers in GP ensembles [J].
De Stefano, C. ;
Folino, G. ;
Fontanella, F. ;
di Freca, A. Scotto .
INFORMATION SCIENCES, 2014, 258 :200-216