Some From Here, Some From There: Cross-Project Code Reuse in GitHub

被引:65
作者
Gharehyazie, Mohammad [1 ]
Ray, Baishakhi [2 ]
Filkov, Vladimir [1 ]
机构
[1] Univ Calif Davis, Davis, CA 95616 USA
[2] Univ Virginia, Charlottesville, VA 22903 USA
来源
2017 IEEE/ACM 14TH INTERNATIONAL CONFERENCE ON MINING SOFTWARE REPOSITORIES (MSR 2017) | 2017年
关键词
CLONE DETECTION;
D O I
10.1109/MSR.2017.15
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
Code reuse has well-known benefits on code quality, coding efficiency, and maintenance. Open Source Software (OSS) programmers gladly share their own code and they happily reuse others'. Social programming platforms like GitHub have normalized code foraging via their common platforms, enabling code search and reuse across different projects. Removing project borders may facilitate more efficient code foraging, and consequently faster programming. But looking for code across projects takes longer and, once found, may be more challenging to tailor to one's needs. Learning how much code reuse goes on across projects, and identifying emerging patterns in past cross-project search behavior may help future foraging efforts. To understand cross-project code reuse, here we present an in-depth study of cloning in GitHub. Using Deckard, a clone finding tool, we identified copies of code fragments across projects, and investigate their prevalence and characteristics using statistical and network science approaches, and with multiple case studies. By triangulating findings from different methods, we find that cross-project cloning is prevalent in GitHub, ranging from cloning few lines of code to whole project repositories. Some of the projects serve as popular sources of clones, and others seem to contain more clones than their fair share. Moreover, we find that ecosystem cloning follows an onion model: most clones come from the same project, then from projects in the same application domain, and finally from projects in different domains. Our results show directions for new tools that can facilitate code foraging and sharing within GitHub.
引用
收藏
页码:291 / 301
页数:11
相关论文
共 30 条
[1]  
Al-Ekram R., 2005, 2005 International Symposium on Empirical Software Engineering (IEEE Cat. No. 05EX1213)
[2]  
[Anonymous], 2007, P 22 IEEE ACM INT C
[3]  
[Anonymous], 2012, P ACM 2012 C COMPUTE
[4]  
Bajracharya S., 2006, COMP 21 ACM SIGPLAN, P681, DOI DOI 10.1145/1176617.1176671
[5]   The Plastic Surgery Hypothesis [J].
Barr, Earl T. ;
Brun, Yuriy ;
Devanbu, Premkumar ;
Harman, Mark ;
Sarro, Federica .
22ND ACM SIGSOFT INTERNATIONAL SYMPOSIUM ON THE FOUNDATIONS OF SOFTWARE ENGINEERING (FSE 2014), 2014, :306-317
[6]  
Bogdan V, 2015, CHI 15
[7]  
Gabel M, 2010, 18 ACM SIGSOFT INT S, P147
[8]  
Gousios G, 2013, IEEE WORK CONF MIN S, P233, DOI 10.1109/MSR.2013.6624034
[9]  
Jiang LX, 2007, PROC INT CONF SOFTW, P96
[10]   Do Code Clones Matter? [J].
Juergens, Elmar ;
Deissenboeck, Florian ;
Hummel, Benjamin ;
Wagner, Stefan .
2009 31ST INTERNATIONAL CONFERENCE ON SOFTWARE ENGINEERING, PROCEEDINGS, 2009, :485-495