PDF Malware Detection Using Visualization and Machine Learning

被引:2
作者
Liu, Ching-Yuan [1 ]
Chiu, Min-Yi [2 ]
Huang, Qi-Xian [2 ]
Sun, Hung-Min [1 ]
机构
[1] Natl Tsing Hua Univ, Dept Comp Sci, Hsinchu, Taiwan
[2] Natl Tsing Hua Univ, Inst Informat Syst & Applicat, Hsinchu, Taiwan
来源
DATA AND APPLICATIONS SECURITY AND PRIVACY XXXV | 2021年 / 12840卷
关键词
Malware detection; PDF malware; Malware visualization; Machine learning;
D O I
10.1007/978-3-030-81242-3_12
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Recently, as more and more disasters caused by malware have been reported worldwide, people started to pay more attention to malware detection to prevent malicious attacks in advance. According to the diversity of the software platforms that people use, the malware also varies pretty much, for example: Xcode Ghost on iOS apps, FakePlayer on Android apps, and WannaCrypt on PC. Moreover, most of the time people ignore the potential security threats around us while surfing the internet, processing files or even reading email. The Portable Document Format (PDF) file, one of the most commonly used file types in the world, can be used to store texts, images, multimedia contents, and even scripts. However, with the increasing popularity and demands of PDF files, only a small fraction of people know how easy it could be to conceal malware in normal PDF files. In this paper, we propose a novel technique combining Malware Visualization and Image Classification to detect PDF files and identify which ones might be malicious. By extracting data from PDF files and traversing each object within, we can obtain the holistic treelike structure of PDF files. Furthermore, according to the signature of the objects in the files, we assign different colors obtained from SimHash to generate RGB images. Lastly, our proposed model trained by the VGG19 with CNN architecture achieved up to 0.973 accuracy and 0.975 F1-score to distinguish malicious PDF files, which is viable for personal, or enterprise-wide use and easy to implement.
引用
收藏
页码:209 / 220
页数:12
相关论文
共 18 条
[1]  
Blonce A., 2008, EUR BLACKHAT 2008 C
[2]   Robust PDF Malware Detection with Image Visualization and Processing Techniques [J].
Corum, Andrew ;
Jenkins, Donovan ;
Zheng, Jun .
2019 2ND INTERNATIONAL CONFERENCE ON DATA INTELLIGENCE AND SECURITY (ICDIS 2019), 2019, :108-114
[3]  
Cybersecurity Insiders, CYB ATT RANS HIDD IN
[4]  
Darus FM, 2019, 2019 IEEE INTERNATIONAL CONFERENCE ON INTERNET OF THINGS AND INTELLIGENCE SYSTEM (IOTAIS), P118, DOI [10.1109/iotais47347.2019.8980412, 10.1109/IoTaIS47347.2019.8980412]
[5]   Malware Visualization for Fine-Grained Classification [J].
Fu, Jianwen ;
Xue, Jingfeng ;
Wang, Yong ;
Liu, Zhenyan ;
Shan, Chun .
IEEE ACCESS, 2018, 6 :14510-14523
[6]   Malware Analysis Using Visualized Image Matrices [J].
Han, KyoungSoo ;
Kang, BooJoong ;
Im, Eul Gyu .
SCIENTIFIC WORLD JOURNAL, 2014,
[7]   Control Flow Graph Based Multiclass Malware Detection Using Bi-normal Separation [J].
Kapoor, Akshay ;
Dhavale, Sunita .
DEFENCE SCIENCE JOURNAL, 2016, 66 (02) :138-145
[8]  
Kaspersky, TOP 4 DANG FIL ATT
[9]  
Laskov P, 2011, 27TH ANNUAL COMPUTER SECURITY APPLICATIONS CONFERENCE (ACSAC 2011), P373
[10]   Digital Investigation of PDF Files Unveiling Traces of Embedded Malware [J].
Maiorca, Davide ;
Biggio, Battista .
IEEE SECURITY & PRIVACY, 2019, 17 (01) :63-71