OnWasted Contributions: Understanding the Dynamics of Contributor-Abandoned Pull Requests-A Mixed-Methods Study of 10 Large Open-Source Projects

被引:10
作者
Khatoonabadi, Sayedhassan [1 ]
Costa, Diego Elias [1 ]
Abdalkareem, Rabe [2 ]
Shihab, Emad [1 ]
机构
[1] Concordia Univ, Dept Comp Sci & Software Engn, Data Driven Anal Software Lab, 2155 Guy St, Montreal, PQ H3H 2L9, Canada
[2] Carleton Univ, Sch Comp Sci, 1125 Colonel By Dr, Ottawa, ON K1S 5B6, Canada
关键词
Socio-technical factors; pull-based development; modern code review; social coding platforms; open-source software; mixed-methods research; AGREEMENT; GITHUB;
D O I
10.1145/3530785
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
Pull-based development has enabled numerous volunteers to contribute to open-source projects with fewer barriers. Nevertheless, a considerable amount of pull requests (PRs) with valid contributions are abandoned by their contributors, wasting the effort and time put in by both the contributors and maintainers. To better understand the underlying dynamics of contributor-abandoned PRs, we conduct a mixed-methods study using both quantitative and qualitative methods. We curate a dataset consisting of 265,325 PRs including 4,450 abandoned ones from ten popular and mature GitHub projects and measure 16 features characterizing PRs, contributors, review processes, and projects. Using statistical and machine learning techniques, we find that complex PRs, novice contributors, and lengthy reviews have a higher probability of abandonment and the rate of PR abandonment fluctuates alongside the projects' maturity or workload. To identify why contributors abandon their PRs, we also manually examine a random sample of 354 abandoned PRs. We observe that the most frequent abandonment reasons are related to the obstacles faced by contributors, followed by the hurdles imposed by maintainers during the review process. Finally, we survey the top core maintainers of the studied projects to understand their perspectives on dealing with PR abandonment and on our findings.
引用
收藏
页数:39
相关论文
共 82 条
[1]  
[Anonymous], 2004, ANN M AM ED RES ASS
[2]   Visualizing the effects of predictor variables in black box supervised learning models [J].
Apley, Daniel W. ;
Zhu, Jingyu .
JOURNAL OF THE ROYAL STATISTICAL SOCIETY SERIES B-STATISTICAL METHODOLOGY, 2020, 82 (04) :1059-1086
[3]  
Ben-Shachar M., 2020, Journal of Open Source Software, V5, P2815, DOI [DOI 10.21105/JOSS.02815, https://doi.org/10.21105/joss.02815]
[4]  
Bischl B, 2016, J MACH LEARN RES, V17
[5]   What's in a GitHub Star? Understanding Repository Starring Practices in a Social Coding Platform [J].
Borges, Hudson ;
Valente, Marco Tulio .
JOURNAL OF SYSTEMS AND SOFTWARE, 2018, 146 :112-129
[6]   Understanding the Factors that Impact the Popularity of GitHub Repositories [J].
Borges, Hudson ;
Hora, Andre ;
Valente, Marco Tulio .
32ND IEEE INTERNATIONAL CONFERENCE ON SOFTWARE MAINTENANCE AND EVOLUTION (ICSME 2016), 2016, :334-344
[7]   The use of the area under the roc curve in the evaluation of machine learning algorithms [J].
Bradley, AP .
PATTERN RECOGNITION, 1997, 30 (07) :1145-1159
[8]   Random forests [J].
Breiman, L .
MACHINE LEARNING, 2001, 45 (01) :5-32
[9]   DOMINANCE STATISTICS - ORDINAL ANALYSES TO ANSWER ORDINAL QUESTIONS [J].
CLIFF, N .
PSYCHOLOGICAL BULLETIN, 1993, 114 (03) :494-509
[10]   A COEFFICIENT OF AGREEMENT FOR NOMINAL SCALES [J].
COHEN, J .
EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT, 1960, 20 (01) :37-46