On the effectiveness of clone detection by string matching

被引:37
作者
Ducasse, S [1 ]
Nierstrasz, O [1 ]
Rieger, M [1 ]
机构
[1] Univ Bern, Inst Appl Math & Sci Comp, Software Composit Grp, CH-3012 Bern, Switzerland
来源
JOURNAL OF SOFTWARE MAINTENANCE AND EVOLUTION-RESEARCH AND PRACTICE | 2006年 / 18卷 / 01期
关键词
software maintenance; duplicated code; string matching; clone detection;
D O I
10.1002/smr.317
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
Although duplicated code is known to pose severe problems for software maintenance, it is difficult to identify in large systems. Many different techniques have been developed to detect software clones, some of which are very sophisticated, but are also expensive to implement and adapt. Lightweight techniques based on simple string matching are easy to implement, but how effective are they? We present a simple string-based approach which we have successfully applied to a number of different languages such COBOL, JAVA, C++, PASCAL, PYTHON, SMALLTALK, C and PDP-11 ASSEMBLER. In each case the maximum time to adapt the approach to a new language was less than 45 minutes. In this paper we investigate a number of simple variants of string-based clone detection that normalize differences due to common editing operations, and assess the quality of clone detection for very different case studies. Our results confirm that this inexpensive clone detection technique generally achieves high recall and acceptable precision. Overzealous normalization of the code before comparison, however, can result in an unacceptable numbers of false positives. Copyright (C) 2005 John Wiley & Sons, Ltd.
引用
收藏
页码:37 / 58
页数:22
相关论文
共 30 条
[1]  
[Anonymous], ACM SIGCSE B
[2]  
BAKER BS, 1995, SECOND WORKING CONFERENCE ON REVERSE ENGINEERING, PROCEEDINGS, P86, DOI 10.1109/WCRE.1995.514697
[3]   Advanced clone-analysis to support object-oriented system refactoring [J].
Balazinska, M ;
Merlo, E ;
Dagenais, M ;
Lagüe, B ;
Kontogiannis, K .
SEVENTH WORKING CONFERENCE ON REVERSE ENGINEERING - PROCEEDINGS, 2000, :98-107
[4]  
BALAZINSKA M, 1999, P 6 WORK C REV ENG, P326
[5]   Clone detection using abstract syntax trees [J].
Baxter, ID ;
Yahin, A ;
Moura, L ;
Sant'Anna, M ;
Bier, L .
INTERNATIONAL CONFERENCE ON SOFTWARE MAINTENANCE, PROCEEDINGS, 1998, :368-377
[6]  
Bellon S., 2002, THESIS U STUTTGART S
[7]  
BELLON S, 2005, DETECTION SOFTWARE C
[8]  
Cordy J R., 2004, Proc. LDTA 2004, ACM 4th International Workshop on Language Descriptions, P1
[9]  
Demeyer S., 2002, OBJECT ORIENTED REEN
[10]  
Ducasse S., 1999, Proceedings IEEE International Conference on Software Maintenance - 1999 (ICSM'99). `Software Maintenance for Business Change' (Cat. No.99CB36360), P109, DOI 10.1109/ICSM.1999.792593