ENCOD: Distinguishing Compressed and Encrypted File Fragments

被引:12
作者
De Gaspari, Fabio [1 ]
Hitaj, Dorjan [1 ]
Pagnotta, Giulio [1 ]
De Carli, Lorenzo [2 ]
Mancini, Luigi, V [1 ]
机构
[1] Sapienza Univ Roma, Dipartimento Informat, Rome, Italy
[2] Worcester Polytech Inst, Dept Comp Sci, Worcester, MA 01609 USA
来源
NETWORK AND SYSTEM SECURITY, NSS 2020 | 2020年 / 12570卷
基金
欧盟地平线“2020”;
关键词
D O I
10.1007/978-3-030-65745-1_3
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Reliable identification of encrypted file fragments is a requirement for several security applications, including ransomware detection, digital forensics, and traffic analysis. A popular approach consists of estimating high entropy as a proxy for randomness. However, many modern content types (e.g. office documents, media files, etc.) are highly compressed for storage and transmission efficiency. Compression algorithms also output high-entropy data, thus reducing the accuracy of entropy-based encryption detectors. Over the years, a variety of approaches have been proposed to distinguish encrypted file fragments from high-entropy compressed fragments. However, these approaches are typically only evaluated over a few, selected data types and fragment sizes, which makes a fair assessment of their practical applicability impossible. This paper aims to close this gap by comparing existing statistical tests on a large, standardized dataset. Our results show that current approaches cannot reliably tell apart encryption and compression, even for large fragment sizes. To address this issue, we design ENCoD, a learning-based classifier which can reliably distinguish compressed and encrypted data, starting with fragments as small as 512 bytes. We evaluate ENCoD against current approaches over a large dataset of different data types, showing that it outperforms current state-of-the-art for most considered fragment sizes and data types.
引用
收藏
页码:42 / 62
页数:21
相关论文
共 40 条
[1]  
Ameeno N., 2019, AMITY J COMPUT SCI, V3, P6
[2]  
Nguyen A, 2015, PROC CVPR IEEE, P427, DOI 10.1109/CVPR.2015.7298640
[3]  
[Anonymous], 2019, DAT DOWNL
[4]  
[Anonymous], 2017, DOCX TRANSITIONAL OF
[5]  
[Anonymous], 2018, WANNACRY CYBER ATTAC
[6]  
[Anonymous], 2018, ATLANTA SPENT 26M RE
[7]  
[Anonymous], 2020, RANSOMWARE ATTACKS G
[8]  
[Anonymous], 2019, Open Images Dataset V5
[9]  
[Anonymous], 2017, Conference on Learning Theory
[10]  
Bassham III L. E., 2010, A statistical test suite for random and pseudorandom number generators for cryptographic applicationsvol, DOI [10.6028/NIST.SP.800-22r1a, DOI 10.6028/NIST.SP.800-22R1A]