Continuously mining distributed version control systems: an empirical study of how Linux uses Git

被引:22
作者
German, Daniel M. [1 ]
Adams, Bram [2 ]
Hassan, Ahmed E. [3 ]
机构
[1] Univ Victoria, Victoria, BC, Canada
[2] Polytech Montreal, Montreal, PQ, Canada
[3] Queens Univ, Kingston, ON, Canada
关键词
Mining software repositories; Distributed version control; Rebasing; Empirical software engineering; Measuring bias; Linux; Open source development;
D O I
10.1007/s10664-014-9356-2
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
Distributed version control systems (D-VCSs - such as git and mercurial) and their hosting services (such as Github and Bitbucket) have revolutionalized the way in which developers collaborate by allowing them to freely exchange and integrate code changes in a peer-to-peer fashion. However, this flexibility comes at a price: code changes are hard to track because of the proliferation of code repositories and because developers modify ("rebase") and filter ("cherry-pick") the history of these changes to streamline their integration into the repositories of other developers. As a consequence, researchers and practitioners, who typically only consider the (cleaned up) history in the official project repository, are unaware of important elements and activities in the collaborative software development process. In this paper, we present a method that continuously mines all known D-VCSs of a software project to uncover the complete development history of a project. We use this method to (1) show the divergence between the code development history in the official Linux kernel repository and the complete kernel development history, and (2) to investigate the characteristics of the ecosystem of git repositories of the Linux kernel. Finally, we discuss how continuous mining could be adopted by current D-VCS hosting services.
引用
收藏
页码:260 / 299
页数:40
相关论文
共 35 条
[1]  
Antoniol Giuliano, 2008, P 2008 C CTR ADV STU
[2]  
Barr ET, 2012, LECT NOTES COMPUT SC, V7212, P316, DOI 10.1007/978-3-642-28872-2_22
[3]  
Baysal O., 2012, 2012 9th IEEE Working Conference on Mining Software Repositories (MSR 2012), P98, DOI 10.1109/MSR.2012.6224305
[4]  
Bird C., 2012, P ACM SIGSOFT 20 INT, P1, DOI [10.1145/2393596.2393648, DOI 10.1145/2393596.2393648]
[5]   Fair and Balanced? Bias in Bug-Fix Datasets [J].
Bird, Christian ;
Bachmann, Adrian ;
Aune, Eirik ;
Duffy, John ;
Bernstein, Abraham ;
Filkov, Vladimir ;
Devanbu, Premkumar .
7TH JOINT MEETING OF THE EUROPEAN SOFTWARE ENGINEERING CONFERENCE AND THE ACM SIGSOFT SYMPOSIUM ON THE FOUNDATIONS OF SOFTWARE ENGINEERING, 2009, :121-130
[6]   The Promises and Perils of Mining Git [J].
Bird, Christian ;
Rigby, Peter C. ;
Barr, Earl T. ;
Hamilton, David J. ;
German, Daniel M. ;
Devanbu, Prem .
2009 6TH IEEE INTERNATIONAL WORKING CONFERENCE ON MINING SOFTWARE REPOSITORIES, 2009, :1-+
[7]  
Bird Christian, 2006, P MSR, V20, P137, DOI DOI 10.1145/1137983.1138016
[8]  
Black Duck Inc, 2013, TOOLS COMP REP
[9]  
Brun Y., 2011, 19 ACM SIGSOFT S 13, P168, DOI DOI 10.1145/2025113.2025139
[10]  
Chacon Scott., 2009, Pro Git, V1st, DOI DOI 10.1016/j.rse.2010.01.021