Software-Clone Rates in Open-Source Programs Written in C or C plus

被引:15
作者
Koschke, Rainer [1 ]
Bazrafshan, Saman [1 ]
机构
[1] Univ Bremen, D-28359 Bremen, Germany
来源
2016 IEEE 23RD INTERNATIONAL CONFERENCE ON SOFTWARE ANALYSIS, EVOLUTION, AND REENGINEERING (SANER), VOL 3 | 2016年
关键词
SYSTEM;
D O I
10.1109/SANER.2016.28
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
It is often claimed that duplicated code, also known as software clones, occurs frequently. Different researchers have reported clone rates in the range of 19 and 28 %, in extreme cases even 59% for particular systems. It is not clear, however, whether those systems are just outliers. In this paper, we analyze about 7,800 open-source projects written in C or C++, summing up to 240 MSLOC, and measure their clone rates. We use statistical analysis to estimate the means of clone rates in open-source projects. Based on our findings, we could not confirm the high clone rates reported in previous studies as expected averages. Except for small projects including a few copied and modified files, we found rather low clone rates compared to previous studies. For instance, if a minimal clone length of 100 tokens (roughly 16 LOC) is requested, we found an average rate of duplicated type-2 clones of about 12 %. For type-1 clones of this length, we found an average clone rate of only 1 %. However, our results show also that cloning is common. We identified only 20% of the projects to have no type-2 clone of at least 100 tokens. And 44% of the projects have at least one type-1 clone of at least 100 tokens.
引用
收藏
页码:1 / 7
页数:7
相关论文
共 21 条
  • [1] [Anonymous], DAGSTUHL SEMINAR P
  • [2] [Anonymous], TECHNICAL REPORT
  • [3] BAKER BS, 1995, SECOND WORKING CONFERENCE ON REVERSE ENGINEERING, PROCEEDINGS, P86, DOI 10.1109/WCRE.1995.514697
  • [4] Clone detection using abstract syntax trees
    Baxter, ID
    Yahin, A
    Moura, L
    Sant'Anna, M
    Bier, L
    [J]. INTERNATIONAL CONFERENCE ON SOFTWARE MAINTENANCE, PROCEEDINGS, 1998, : 368 - 377
  • [5] Bazrafshan S., 2012, 2012 12th IEEE Working Conference on Source Code Analysis and Manipulation (SCAM 2012), P74, DOI 10.1109/SCAM.2012.18
  • [6] Comparison and evaluation of clone detection tools
    Bellon, Stefan
    Koschke, Rainer
    Antoniol, Giuliano
    Krinke, Jens
    Merlo, Ettore
    [J]. IEEE TRANSACTIONS ON SOFTWARE ENGINEERING, 2007, 33 (09) : 577 - 591
  • [7] Big Data for Digital Government: Opportunities, Challenges, and Strategies
    Chen, Yu-Che
    Hsieh, Tsui-Chuan
    [J]. INTERNATIONAL JOURNAL OF PUBLIC ADMINISTRATION IN THE DIGITAL AGE, 2014, 1 (01) : 1 - 14
  • [8] Ducasse S., 1999, Proceedings IEEE International Conference on Software Maintenance - 1999 (ICSM'99). `Software Maintenance for Business Change' (Cat. No.99CB36360), P109, DOI 10.1109/ICSM.1999.792593
  • [9] Evolution of Type-1 Clones
    Goede, Nils
    [J]. 2009 NINTH IEEE INTERNATIONAL WORKING CONFERENCE ON SOURCE CODE ANALYSIS AND MANIPULATION, PROCEEDINGS, 2009, : 77 - 86
  • [10] CCFinder: A multilinguistic token-based code clone detection system for large scale source code
    Kamiya, T
    Kusumoto, S
    Inoue, K
    [J]. IEEE TRANSACTIONS ON SOFTWARE ENGINEERING, 2002, 28 (07) : 654 - 670