The Efficient Implementation of Distributed Indexing with Hadoop for Digital Investigations on Big Data

被引:12
作者
Lee, Taerim [1 ]
Lee, Hyejoo [3 ]
Rhee, Kyung-Hyune [2 ]
Shin, Sang Uk
机构
[1] Pukyong Natl Univ, Dept Informat Secur, Grad Sch, Pusan, South Korea
[2] Pukyong Natl Univ, Dept IT Convergence & Applicat Engn, Pusan, South Korea
[3] Kongju Natl Univ, Dept Appl Math, Gongju, South Korea
基金
新加坡国家研究基金会;
关键词
Electronic Discovery; e-Discovery; Digital Forensics; Evidence Search; Indexing Performance; Hadoop MapReduce; Distributed Indexing;
D O I
10.2298/CSIS130920063L
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Big Data brings new challenges to the field of e-Discovery or digital forensics and these challenges are mostly connected to the various methods for data processing. Considering that the most important factors are time and cost in determining success or failure of digital investigation, the development of a valid indexing method for efficient search should come first to more quickly and accurately find relevant evidence from Big Data. This paper, therefore, introduces a Distributed Text Processing System based on Hadoop called DTPS and explains about the distinctions between DTPS and other related researches to emphasize the necessity of it. In addition, this paper describes various experimental results in order to find the best implementation strategy in using Hadoop MapReduce for the distributed indexing and to analyze the worth for practical use of DTPS by comparative evaluation of its performance with similar tools. To be short, the ultimate purpose of this research is the development of useful search engine specially aimed at Big Data indexing as a major part, for the future e-Discovery cloud service.
引用
收藏
页码:1037 / 1054
页数:18
相关论文
共 15 条
[1]  
[Anonymous], SVMLIGHT SUPPORT VEC
[2]  
[Anonymous], 2006, EDRM EDRM FRAMEWORK
[3]  
[Anonymous], 2010, TEXT RETRIEVAL C
[4]  
[Anonymous], 2008, Introduction to information retrieval
[5]  
[Anonymous], 2004, Lucene in Action
[6]  
[Anonymous], IMPLEMENTATION PERFO
[7]  
[Anonymous], 2013, SNAPP FAST COMPR DEC
[8]  
[Anonymous], 2012, Hadoop: The definitive guide
[9]  
Butler M.H., 2008, DISTR LUC DISTR FREE
[10]  
Cohen A.I., 2008, ESI HDB SOURCES TECH