Detection and Elimination of Systematic Labeling Bias in Code Reviewer Recommendation Systems

被引：10

作者：

Tecimer, K. Ayberk ^{[1
]}

Tuzun, Eray ^{[2
]}

Dibeklioglu, Hamdi ^{[2
]}

Erdogmus, Hakan ^{[3
]}

机构：

[1] Tech Univ Munich, Munich, Germany

[2] Bilkent Univ, Ankara, Turkey

[3] Carnegie Mellon Univ, Pittsburgh, PA 15213 USA

来源：

PROCEEDINGS OF EVALUATION AND ASSESSMENT IN SOFTWARE ENGINEERING (EASE 2021) | 2021年

关键词：

modern code review; ground truth; labeling bias elimination; systematic labeling bias; data cleaning; code review recommendation;

D O I：

10.1145/3463274.3463336

中图分类号：

TP31 [计算机软件];

学科分类号：

081202 ; 0835 ;

摘要：

Reviewer selection in modern code review is crucial for effective code reviews. Several techniques exist for recommending reviewers appropriate for a given pull request (PR). Most code reviewer recommendation techniques in the literature build and evaluate their models based on datasets collected from real projects using open-source or industrial practices. The techniques invariably presume that these datasets reliably represent the "ground truth." In the context of a classification problem, ground truth refers to the objectively correct labels of a class used to build models from a dataset or evaluate a model's performance. In a project dataset used to build a code reviewer recommendation system, the recommended code reviewer picked for a PR is usually assumed to be the best code reviewer for that PR. However, in practice, the recommended code reviewer may not be the best possible code reviewer, or even a qualified one. Recent code reviewer recommendation studies suggest that the datasets used tend to suffer from systematic labeling bias, making the ground truth unreliable. Therefore, models and recommendation systems built on such datasets may perform poorly in real practice. In this study, we introduce a novel approach to automatically detect and eliminate systematic labeling bias in code reviewer recommendation systems. The bias that we remove results from selecting reviewers that do not ensure a permanently successful fix for a bug-related PR. To demonstrate the effectiveness of our approach, we evaluated it on two open-source project datasets -HIVE and QT Creator- and with five code reviewer recommendation techniques -Profile-Based, RSTrace, Naive Bayes, k-NN, and Decision Tree. Our debiasing approach appears promising since it improved the Mean Reciprocal Rank (MRR) of the evaluated techniques up to 26% in the datasets used.

引用

页码：181 / 190

页数：10

共 36 条

[1] Snoring: a Noise in Defect Prediction Datasets
Ahluwalia, Aalok
Falessi, Davide
Di Penta, Massimiliano
[J]. 2019 IEEE/ACM 16TH INTERNATIONAL CONFERENCE ON MINING SOFTWARE REPOSITORIES (MSR 2019), 2019, : 63 - 67
[2] Bacchelli A, 2018, IEEE T SOFTWARE ENG, P1, DOI DOI 10.1109/TSE.2018
[3] Bacchelli A, 2013, PROCEEDINGS OF THE 35TH INTERNATIONAL CONFERENCE ON SOFTWARE ENGINEERING (ICSE 2013), P712, DOI 10.1109/ICSE.2013.6606617
[4] Balachandran V, 2013, PROCEEDINGS OF THE 35TH INTERNATIONAL CONFERENCE ON SOFTWARE ENGINEERING (ICSE 2013), P931, DOI 10.1109/ICSE.2013.6606642
[5] Fair and Balanced? Bias in Bug-Fix Datasets
Bird, Christian
Bachmann, Adrian
Aune, Eirik
Duffy, John
Bernstein, Abraham
Filkov, Vladimir
Devanbu, Premkumar
[J]. 7TH JOINT MEETING OF THE EUROPEAN SOFTWARE ENGINEERING CONFERENCE AND THE ACM SIGSOFT SYMPOSIUM ON THE FOUNDATIONS OF SOFTWARE ENGINEERING, 2009, : 121 - 130
[6] Characteristics of Useful Code Reviews: An Empirical Study at Microsoft
Bosu, Amiangshu
Greiler, Michaela
Bird, Christian
[J]. 12TH WORKING CONFERENCE ON MINING SOFTWARE REPOSITORIES (MSR 2015), 2015, : 146 - 156
[7] Systematic Labeling Bias: De-biasing where Everyone is Wrong
Cabrera, Guillermo F.
Miller, Christopher J.
Schneider, Jeff
[J]. 2014 22ND INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION (ICPR), 2014, : 4417 - 4422
[8] A review of code reviewer recommendation studies: Challenges and future directions
Cetin, H. Alperen
Dogan, Emre
Tuzun, Eray
[J]. SCIENCE OF COMPUTER PROGRAMMING, 2021, 208
[9] Chen T, 2014, PROCEEDINGS OF THE FIFTH INTERNATIONAL SYMPOSIUM ON TEST AUTOMATION & INSTRUMENTATION, VOLS 1 AND 2, P82
[10] Dogan E, 2019, INT SYMP EMP SOFTWAR, P7

← 1 2 3 4 →