Flattened Data in Convolutional Neural Networks: Using Malware Detection as Case Study

被引:8
作者
Yeh, Chih-Wei [1 ]
Yeh, Wan-Ting [1 ]
Hung, Shih-Hao [1 ]
Lin, Chih-Ta [2 ]
机构
[1] Natl Taipei Univ, Dept Comp Sci & Informat Engn, New Taipei 23741, Taiwan
[2] Inst Informat Ind, Taipei, Taiwan
来源
2016 RESEARCH IN ADAPTIVE AND CONVERGENT SYSTEMS | 2016年
关键词
Android; malware; dynamic analysis; machine learning; convolutional neural networks;
D O I
10.1145/2987386.2987406
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Convolutional Neural Networks (CNNs) are very powerful variants of multilayer perceptron models inspired by human's brain neural system to reveal local, spatial correlation in a series of data. While CNNs are popularly used for image recognition nowadays, it is also possible to apply CNNs in other areas, for example, detection of malicious software. In this paper, we show how CNNs may be used to improve the classification of malicious software due to the high-level feature abstraction and equal-variance property against noises. Taking advantages of convolution kernels, CNNs are naturally born for pattern recognition on images only. For this application, we introduce a new transformation technique which converts a series of event logs into flattened data with two-dimensional features so that CNNs can be trained to detect malicious behaviors effectively. With the combination property and the proposed flattened input format, CNN can perform a k-skip-n-gram dimensionality reduction which learns more flexible and complex patterns comparing to the traditional solutions. Our preliminary results show that our latest CNNs-based malware detection engine reaches 93.012% prediction accuracy and 12.9% FNR under 32,000 samples of a training set. To our knowledge, this is the first paper discussing the application and effectiveness of CNNs on malware detection.
引用
收藏
页码:130 / 135
页数:6
相关论文
共 17 条
[1]  
Amos B, 2013, INT WIREL COMMUN, P1666, DOI 10.1109/IWCMC.2013.6583806
[2]  
[Anonymous], 2014, CYBERSECURITY SYSTEM
[3]  
[Anonymous], 2012, P 10 INT C MOB SYST
[4]  
[Anonymous], MULTIDIGIT RECOGNITI
[5]  
Arp D., 2014, P ANN S NETW DISTR S P ANN S NETW DISTR S
[6]  
Chilimbi TrishulM., Project adam: Building an efficient and scalable deep learning training system
[7]   Multi-column deep neural network for traffic sign classification [J].
Ciresan, Dan ;
Meier, Ueli ;
Masci, Jonathan ;
Schmidhuber, Juergen .
NEURAL NETWORKS, 2012, 32 :333-338
[8]  
Dahl GE, 2013, INT CONF ACOUST SPEE, P3422, DOI 10.1109/ICASSP.2013.6638293
[9]   Learning Rich Features from RGB-D Images for Object Detection and Segmentation [J].
Gupta, Saurabh ;
Girshick, Ross ;
Arbelaez, Pablo ;
Malik, Jitendra .
COMPUTER VISION - ECCV 2014, PT VII, 2014, 8695 :345-360
[10]   Gradient-based learning applied to document recognition [J].
Lecun, Y ;
Bottou, L ;
Bengio, Y ;
Haffner, P .
PROCEEDINGS OF THE IEEE, 1998, 86 (11) :2278-2324