ENCOD: Distinguishing Compressed and Encrypted File Fragments

被引：12

作者：

De Gaspari, Fabio ^{[1
]}

Hitaj, Dorjan ^{[1
]}

Pagnotta, Giulio ^{[1
]}

De Carli, Lorenzo ^{[2
]}

Mancini, Luigi, V ^{[1
]}

机构：

[1] Sapienza Univ Roma, Dipartimento Informat, Rome, Italy

[2] Worcester Polytech Inst, Dept Comp Sci, Worcester, MA 01609 USA

来源：

NETWORK AND SYSTEM SECURITY, NSS 2020 | 2020年 / 12570卷

基金：

欧盟地平线“2020”;

关键词：

D O I：

10.1007/978-3-030-65745-1_3

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Reliable identification of encrypted file fragments is a requirement for several security applications, including ransomware detection, digital forensics, and traffic analysis. A popular approach consists of estimating high entropy as a proxy for randomness. However, many modern content types (e.g. office documents, media files, etc.) are highly compressed for storage and transmission efficiency. Compression algorithms also output high-entropy data, thus reducing the accuracy of entropy-based encryption detectors. Over the years, a variety of approaches have been proposed to distinguish encrypted file fragments from high-entropy compressed fragments. However, these approaches are typically only evaluated over a few, selected data types and fragment sizes, which makes a fair assessment of their practical applicability impossible. This paper aims to close this gap by comparing existing statistical tests on a large, standardized dataset. Our results show that current approaches cannot reliably tell apart encryption and compression, even for large fragment sizes. To address this issue, we design ENCoD, a learning-based classifier which can reliably distinguish compressed and encrypted data, starting with fragments as small as 512 bytes. We evaluate ENCoD against current approaches over a large dataset of different data types, showing that it outperforms current state-of-the-art for most considered fragment sizes and data types.

引用

页码：42 / 62

页数：21

共 40 条

[1]

Ameeno N., 2019, AMITY J COMPUT SCI, V3, P6

[2]

Nguyen A, 2015, PROC CVPR IEEE, P427, DOI 10.1109/CVPR.2015.7298640

[3]

[Anonymous], 2019, DAT DOWNL

[4]

[Anonymous], 2017, DOCX TRANSITIONAL OF

[5]

[Anonymous], 2018, WANNACRY CYBER ATTAC

[6]

[Anonymous], 2018, ATLANTA SPENT 26M RE

[7]

[Anonymous], 2020, RANSOMWARE ATTACKS G

[8]

[Anonymous], 2019, Open Images Dataset V5

[9]

[Anonymous], 2017, Conference on Learning Theory

[10]

Bassham III L. E., 2010, A statistical test suite for random and pseudorandom number generators for cryptographic applicationsvol, DOI [10.6028/NIST.SP.800-22r1a, DOI 10.6028/NIST.SP.800-22R1A]

← 1 2 3 4 →