MalJPEG: Machine Learning Based Solution for the Detection of Malicious JPEG Images

被引:26
作者
Cohen, Aviad [1 ]
Nissim, Nir [1 ,2 ]
Elovici, Yuval [3 ]
机构
[1] Ben Gurion Univ Negev, Cyber Secur Res Ctr, Malware Lab, IL-8410501 Beer Sheva, Israel
[2] Ben Gurion Univ Negev, Dept Ind Engn & Management, IL-8410501 Beer Sheva, Israel
[3] Ben Gurion Univ Negev, Dept Software & Informat Engn, IL-8410501 Beer Sheva, Israel
关键词
JPEG; image; malware; detection; machine learning; features; STEGANOGRAPHY; METHODOLOGY; COMPRESSION; FEATURES;
D O I
10.1109/ACCESS.2020.2969022
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
In recent years, cyber-attacks against individuals, businesses, and organizations have increased. Cyber criminals are always looking for effective vectors to deliver malware to victims in order to launch an attack. Images are used on a daily basis by millions of people around the world, and most users consider images to be safe for use; however, some types of images can contain a malicious payload and perform harmful actions. JPEG is the most popular image format, primarily due to its lossy compression. It is used by almost everyone, from individuals to large organizations, and can be found on almost every device (on digital cameras and smartphones, websites, social media, etc.). Because of their harmless reputation, massive use, and high potential for misuse, JPEG images are used by cyber criminals as an attack vector. While machine learning methods have been shown to be effective at detecting known and unknown malware in various domains, to the best of our knowledge, machine learning methods have not been used particularly for the detection of malicious JPEG images. In this paper, we present MalJPEG, the first machine learning-based solution tailored specifically at the efficient detection of unknown malicious JPEG images. MalJPEG statically extracts 10 simple yet discriminative features from the JPEG file structure and leverages them with a machine learning classifier, in order to discriminate between benign and malicious JPEG images. We evaluated MalJPEG extensively on a real-world representative collection of 156,818 images which contains 155,013 (98.85;) benign and 1,805 (1.15;) malicious images. The results show that MalJPEG, when used with the LightGBM classifier, demonstrates the highest detection capabilities, with an area under the receiver operating characteristic curve (AUC) of 0.997, true positive rate (TPR) of 0.951, and a very low false positive rate (FPR) of 0.004.
引用
收藏
页码:19997 / 20011
页数:15
相关论文
共 56 条
[1]   IntelliAV: Toward the Feasibility of Building Intelligent Anti-malware on Android Devices [J].
Ahmadi, Mansour ;
Sotgiu, Angelo ;
Giacinto, Giorgio .
MACHINE LEARNING AND KNOWLEDGE EXTRACTION, CD-MAKE 2017, 2017, 10410 :137-154
[2]   A Two-Stage Methodology Using K-NN and False-Positive Minimizing ELM for Nominal Data Classification [J].
Akusok, Anton ;
Miche, Yoan ;
Hegedus, Jozsef ;
Nian, Rui ;
Lendasse, Amaury .
COGNITIVE COMPUTATION, 2014, 6 (03) :432-445
[3]  
[Anonymous], ELECT IMAG
[4]  
[Anonymous], IEEE ACCESS
[5]  
[Anonymous], NOVEL MACHINE LEARNI
[6]  
[Anonymous], 2013, P 20 ANN NETW DISTR
[7]  
[Anonymous], 2017, ARXIV171109335
[8]  
[Anonymous], 2008, P BRIT MACH VIS C, DOI [10.5244/C.22.50, DOI 10.5244/C.22.50]
[9]  
[Anonymous], P MACH LEARN COMP SE
[10]   A Robust Image Steganography on Resisting JPEG Compression with No Side Information [J].
Bao, Z. ;
Luo, X. ;
Zhang, Y. ;
Yang, C. ;
Liu, F. .
IETE TECHNICAL REVIEW, 2018, 35 :4-13