TLSH - A Locality Sensitive Hash

被引:107
作者
Oliver, Jonathan [1 ]
Cheng, Chun [1 ]
Chen, Yanggui [1 ]
机构
[1] Trend Micro, N Ryde, NSW 2113, Australia
来源
2013 FOURTH CYBERCRIME AND TRUSTWORTHY COMPUTING WORKSHOP (CTC 2013) | 2014年
关键词
locality sensitive hash; fuzzy hashing; data fingerprinting; similarity digests; Ssdeep; Sdhash; Nilsimsa; TLSH;
D O I
10.1109/CTC.2013.9
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
Cryptographic hashes such as MD5 and SHA-1 are used for many data mining and security applications - they are used as an identifier for files and documents. However, if a single byte of a file is changed, then cryptographic hashes result in a completely different hash value. It would be very useful to work with hashes which identify that files were similar based on their hash values. The security field has proposed similarity digests, and the data mining community has proposed locality sensitive hashes. Some proposals include the Nilsimsa hash (a locality sensitive hash), Ssdeep and Sdhash (both Ssdeep and Sdhash are similarity digests). Here, we describe a new locality sensitive hashing scheme the TLSH. We provide algorithms for evaluating and comparing hash values and provide a reference to its open source code. We do an empirical evaluation of publically available similarity digest schemes. The empirical evaluation highlights significant problems with previously proposed schemes; the TLSH scheme does not suffer from the flaws identified.
引用
收藏
页码:7 / 13
页数:7
相关论文
共 7 条
[1]  
Breitinger F., 2011, THESIS HOCHSCHULE DA
[2]  
Damianil E., 2004, P 2004 INT WORKSH SE
[3]   Identifying almost identical files using context triggered piecewise hashing [J].
Kornblum, Jesse .
DIGITAL INVESTIGATION, 2006, :S91-S97
[4]   FAST HASHING OF VARIABLE-LENGTH TEXT STRINGS [J].
PEARSON, PK .
COMMUNICATIONS OF THE ACM, 1990, 33 (06) :677-680
[5]  
Roussev V, 2010, IFIP ADV INF COMM TE, V337, P207
[6]   An evaluation of forensic similarity hashes [J].
Roussev, Vassil .
DIGITAL INVESTIGATION, 2011, 8 :S34-S41
[7]  
Shinde R., 2010, P 2010 INT C MAN DAT