An Exploratory Study of Interface Redundancy in Code Repositories

被引:6
作者
de Paula, Adriano Carvalho [1 ]
Guerra, Eduardo [1 ]
Lopes, Cristina V. [2 ]
Sajnani, Hitesh [4 ]
Lazzarini Lemos, Otavio Augusto [2 ,3 ]
机构
[1] Inst Nacl Pesquisas Espaciais, Sao Jose Dos Campos, Brazil
[2] Univ Calif Irvine, Donald Bren Sch Informat & Comp Sci, Irvine, CA USA
[3] Fed Univ Sao Paulo SJ dos Campos, Sci & Technol Dept, Sao Jose Dos Campos, SP, Brazil
[4] Microsoft Inc, Los Angeles, CA USA
来源
2016 IEEE 16TH INTERNATIONAL WORKING CONFERENCE ON SOURCE CODE ANALYSIS AND MANIPULATION (SCAM) | 2016年
关键词
CLONE; SYSTEM;
D O I
10.1109/SCAM.2016.31
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
An important property of software repositories is their level of cross-project redundancy. For instance, much has been done to assess how much code cloning happens across software corpora. In this paper we study a much less targeted type of replication: Interface Redundancy (IR). IR refers to the level of repetition of whole method interfaces - return type, method name, and parameters types - across a code corpus. Such type of redundancy is important because if two non-trivial methods ever share the same interface it is very likely that they implement analogous functions, even though their code, structure, or vocabulary might be diverse. A certain level of IR is a requirement for approaches that rely on the recurrence of interfaces to fulfill a given task (e.g., interface-driven code search - IDCS). In this paper we report on an experiment to measure IR in a large-scale Java repository. Our target corpus contains more than 380,000 methods from 99 Java projects extracted randomly from an open source repository. Results are promising as they show that the chances of an interface from a non-trivial method to repeat itself across a large repository is around 25% (i.e., approximately 1/4 of such interfaces are redundant). Also, more than 80% of the target projects contained IR (with the average percentage of redundant interfaces for these projects being above 30%). As additional analyses we investigated the distribution of the different types of redundant interfaces (e.g., intra-vs inter-project); characterized the redundant interfaces and show that such a knowledge can help improve IDCS; and provided evidence that only a very small part of IR refers to method cloning (around 0.002%).
引用
收藏
页码:107 / 116
页数:10
相关论文
共 31 条
[1]  
[Anonymous], 2008, Introduction to information retrieval
[2]   Sourcerer: An infrastructure for large-scale collection and analysis of open-source code [J].
Bajracharya, Sushi ;
Ossher, Joel ;
Lopes, Cristina .
SCIENCE OF COMPUTER PROGRAMMING, 2014, 79 :241-259
[3]  
BAKER BS, 1995, SECOND WORKING CONFERENCE ON REVERSE ENGINEERING, PROCEEDINGS, P86, DOI 10.1109/WCRE.1995.514697
[4]   The Plastic Surgery Hypothesis [J].
Barr, Earl T. ;
Brun, Yuriy ;
Devanbu, Premkumar ;
Harman, Mark ;
Sarro, Federica .
22ND ACM SIGSOFT INTERNATIONAL SYMPOSIUM ON THE FOUNDATIONS OF SOFTWARE ENGINEERING (FSE 2014), 2014, :306-317
[5]   Comparison and evaluation of clone detection tools [J].
Bellon, Stefan ;
Koschke, Rainer ;
Antoniol, Giuliano ;
Krinke, Jens ;
Merlo, Ettore .
IEEE TRANSACTIONS ON SOFTWARE ENGINEERING, 2007, 33 (09) :577-591
[6]   A Survey of Automatic Query Expansion in Information Retrieval [J].
Carpineto, Claudio ;
Romano, Giovanni .
ACM COMPUTING SURVEYS, 2012, 44 (01)
[7]   The NiCad Clone Detector [J].
Cordy, James R. ;
Roy, Chanchal K. .
2011 IEEE 19TH INTERNATIONAL CONFERENCE ON PROGRAM COMPREHENSION (ICPC), 2011, :219-+
[8]  
Fraser G, 2012, PROC INT CONF SOFTW, P178, DOI 10.1109/ICSE.2012.6227195
[9]  
Gabel M, 2010, 18 ACM SIGSOFT INT S, P147
[10]   How Should We Measure Functional Sameness from Program Source Code? An Exploratory Study on Java']Java Methods [J].
Higo, Yoshiki ;
Kusumoto, Shinji .
22ND ACM SIGSOFT INTERNATIONAL SYMPOSIUM ON THE FOUNDATIONS OF SOFTWARE ENGINEERING (FSE 2014), 2014, :294-305