MapReduce-based Frequent Itemset Mining for Analysis of Electronic Evidence

被引:0
作者
Jiang, Xueqing [1 ]
Sun, Guozi [1 ,2 ]
机构
[1] Nanjing Univ Posts & Telecommun, Coll Comp, Nanjing, Jiangsu, Peoples R China
[2] Jiangsu High Technol Res Key Lab Wireless Sensor, Nanjing, Jiangsu, Peoples R China
来源
2013 EIGHTH INTERNATIONAL WORKSHOP ON SYSTEMATIC APPROACHES TO DIGITAL FORENSIC ENGINEERING (SADFE) | 2013年
关键词
computer crime; PISPO; ISPO-tree; MapReduce; frequent itemset; data mining; association rules; TREE;
D O I
暂无
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
Association rules can mine the relevant evidence of computer crime from the massive data and association rules among data itemset, and further mine crime trends and connections among different crimes. They can help polices detect case and prevent crime with clues and criterions. Frequent itemset mining (FIM) plays a fundamental role in mining associations, correlations and many real-world data mining fields such as electronic evidence analysis area. FP-growth is the most famous FIM algorithm for discovering frequent patterns. As the data incrementing, the cost of time and space will be the bottleneck of FP-growth mining algorithms. One of the existing incremental frequent pattern mining algorithms called SPO-tree can perform incremental mining by a single scan for incremental mining. But how to apply this algorithm to the analysis of electronic evidence more effectively will become the focus of this paper. In the past research, little people take care of the item mined to the frequent item needing to update or inserted a little data. The past algorithms are not suit for this problem especially in forensic area. So, in this paper, we propose a novel parallelized algorithm called PISPO based on the cloud-computing framework MapReduce, which is widely used to cope with large-scale data and captures both the content and state to be distributed to the changed and original of the transactions dataset to SPO-tree.
引用
收藏
页数:6
相关论文
共 16 条
  • [1] Agrawal R., 1993, SIGMOD Record, V22, P207, DOI 10.1145/170036.170072
  • [2] Bhatt VH., 2010, IACSIT INT J ENG TEC, V2, P313, DOI DOI 10.7763/IJET.2010.V2.140
  • [3] BRADSKI G., 2007, NIPS, P281
  • [4] Dean J, 2004, USENIX ASSOCIATION PROCEEDINGS OF THE SIXTH SYMPOSIUM ON OPERATING SYSTEMS DESIGN AND IMPLEMENTATION (OSDE '04), P137
  • [5] Garfinkel S. L., 2010, P DFRWS 10 S AUG, V7, pS64
  • [6] Han JW, 2000, SIGMOD RECORD, V29, P1
  • [7] Le Zhou, 2010, Proceedings 2010 IEEE Youth Conference on Information, Computing and Telecommunications (YC-ICT 2010), P243, DOI 10.1109/YCICT.2010.5713090
  • [8] LESCHKE TR, 2012, P 9 INT S VIS CYB SE, P48
  • [9] CanTree: a canonical-order tree for incremental frequent-pattern mining
    Leung, Carson Kai-Sang
    Khan, Quamrul I.
    Li, Zhan
    Hoque, Tariqul
    [J]. KNOWLEDGE AND INFORMATION SYSTEMS, 2007, 11 (03) : 287 - 311
  • [10] Li HY, 2008, RECSYS'08: PROCEEDINGS OF THE 2008 ACM CONFERENCE ON RECOMMENDER SYSTEMS, P107