An evaluation of forensic similarity hashes

被引:73
作者
Roussev, Vassil [1 ]
机构
[1] Univ New Orleans, Dept Comp Sci, New Orleans, LA 70148 USA
关键词
Digital forensics; Similarity hash; Similarity digest; Sdhash; Ssdeep;
D O I
10.1016/j.diin.2011.05.005
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
The fast growth of the average size of digital forensic targets demands new automated means to quickly, accurately and reliably correlate digital artifacts. Such tools need to offer more flexibility than the routine known-file filtering based on crypto hashes. Currently, there are two tools for which NIST has produced reference hash sets-ssdeep and sdhash. The former provides a fixed-sized fuzzy hash based on random polynomials, whereas the latter produces a variable-length similarity digest based on statistically-identified features packed into Bloom filters. This study provides a baseline evaluation of the capabilities of these tools both in a controlled environment and on real-world data. The results show that the similarity digest approach significantly outperforms in terms of recall and precision in all tested scenarios and demonstrates robust and scalable behavior. (C) 2011 V. Roussev. Published by Elsevier Ltd. All rights reserved.
引用
收藏
页码:S34 / S41
页数:8
相关论文
共 7 条
[1]   SPACE/TIME TRADE/OFFS IN HASH CODING WITH ALLOWABLE ERRORS [J].
BLOOM, BH .
COMMUNICATIONS OF THE ACM, 1970, 13 (07) :422-&
[2]  
BRODER A, 2002, ANN ALL C COMM CONTR
[3]  
GARFINKEL S, 2009, P DIG FOR RES C DFRW, pS2
[4]   Identifying almost identical files using context triggered piecewise hashing [J].
Kornblum, Jesse .
DIGITAL INVESTIGATION, 2006, :S91-S97
[5]  
RABIN MO, 1981, TR1581 HARV U CTR RE
[6]  
Roussev V., 2009, P 42 HAW INT C SYST, P1, DOI 10.1109/HICSS.2009.97
[7]  
Roussev V, 2010, IFIP ADV INF COMM TE, V337, P207