A parallel and efficient approach to large scale clone detection

被引:14
作者
Sajnani, Hitesh [1 ]
Saini, Vaibhav [1 ]
Lopes, Cristina [1 ]
机构
[1] Univ Calif Irvine, Donald Bren Sch Informat & Comp Sci, Irvine, CA 92697 USA
基金
美国国家科学基金会;
关键词
clone detection; large scale; parallel; mapreduce; index-based; SYSTEM; CODE;
D O I
10.1002/smr.1707
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
We propose a new token-based approach for large -scale code clone detection, which is based on a filtering heuristic that reduces the number of token comparisons when the two code blocks are compared. We also present a MapReduce based parallel algorithm that uses the filtering heuristic and scales to thousands of projects. The filtering heuristic is generic and can also be used in conjunction with other token-based approaches. In that context, we demonstrate how it can increase the retrieval speed and decrease the memory usage of the index-based approaches. In our experiments on 36 open source Java projects, we found that: (i) filtering reduces token comparisons by a factor of 10, and thus increasing the speed of clone detection by a factor of 1.5; (ii) the speed-up and scale-up of the parallel approach using filtering is near-linear on a cluster of 2-32 nodes for 150-2800 projects; and (iii) filtering decreases the memory usage of index-based approach by half and the search time by a factor of 5. Copyright (C) 2015 John Wiley & Sons, Ltd.
引用
收藏
页码:402 / 429
页数:28
相关论文
共 33 条
  • [1] Baker B. S., 1995, P WORK C REV ENG
  • [2] Clone detection using abstract syntax trees
    Baxter, ID
    Yahin, A
    Moura, L
    Sant'Anna, M
    Bier, L
    [J]. INTERNATIONAL CONFERENCE ON SOFTWARE MAINTENANCE, PROCEEDINGS, 1998, : 368 - 377
  • [3] Comparison and evaluation of clone detection tools
    Bellon, Stefan
    Koschke, Rainer
    Antoniol, Giuliano
    Krinke, Jens
    Merlo, Ettore
    [J]. IEEE TRANSACTIONS ON SOFTWARE ENGINEERING, 2007, 33 (09) : 577 - 591
  • [4] Cordy J., 2011, P ICPC
  • [5] Davies J., 2011, P MSR
  • [6] German D. M., 2009, MIN SOFTW REP 2009 M
  • [7] Gode N, 2009, P CSMR
  • [8] Hemel A., 2012, 2012 19th Working Conference on Reverse Engineering (WCRE), P357, DOI 10.1109/WCRE.2012.45
  • [9] Hummel B., 2010, P ICSM
  • [10] Ishihara T., 2012, 2012 19th Working Conference on Reverse Engineering (WCRE), P387, DOI 10.1109/WCRE.2012.48