Detection and Elimination of Systematic Labeling Bias in Code Reviewer Recommendation Systems

被引:10
作者
Tecimer, K. Ayberk [1 ]
Tuzun, Eray [2 ]
Dibeklioglu, Hamdi [2 ]
Erdogmus, Hakan [3 ]
机构
[1] Tech Univ Munich, Munich, Germany
[2] Bilkent Univ, Ankara, Turkey
[3] Carnegie Mellon Univ, Pittsburgh, PA 15213 USA
来源
PROCEEDINGS OF EVALUATION AND ASSESSMENT IN SOFTWARE ENGINEERING (EASE 2021) | 2021年
关键词
modern code review; ground truth; labeling bias elimination; systematic labeling bias; data cleaning; code review recommendation;
D O I
10.1145/3463274.3463336
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
Reviewer selection in modern code review is crucial for effective code reviews. Several techniques exist for recommending reviewers appropriate for a given pull request (PR). Most code reviewer recommendation techniques in the literature build and evaluate their models based on datasets collected from real projects using open-source or industrial practices. The techniques invariably presume that these datasets reliably represent the "ground truth." In the context of a classification problem, ground truth refers to the objectively correct labels of a class used to build models from a dataset or evaluate a model's performance. In a project dataset used to build a code reviewer recommendation system, the recommended code reviewer picked for a PR is usually assumed to be the best code reviewer for that PR. However, in practice, the recommended code reviewer may not be the best possible code reviewer, or even a qualified one. Recent code reviewer recommendation studies suggest that the datasets used tend to suffer from systematic labeling bias, making the ground truth unreliable. Therefore, models and recommendation systems built on such datasets may perform poorly in real practice. In this study, we introduce a novel approach to automatically detect and eliminate systematic labeling bias in code reviewer recommendation systems. The bias that we remove results from selecting reviewers that do not ensure a permanently successful fix for a bug-related PR. To demonstrate the effectiveness of our approach, we evaluated it on two open-source project datasets -HIVE and QT Creator- and with five code reviewer recommendation techniques -Profile-Based, RSTrace, Naive Bayes, k-NN, and Decision Tree. Our debiasing approach appears promising since it improved the Mean Reciprocal Rank (MRR) of the evaluated techniques up to 26% in the datasets used.
引用
收藏
页码:181 / 190
页数:10
相关论文
共 36 条
  • [1] Snoring: a Noise in Defect Prediction Datasets
    Ahluwalia, Aalok
    Falessi, Davide
    Di Penta, Massimiliano
    [J]. 2019 IEEE/ACM 16TH INTERNATIONAL CONFERENCE ON MINING SOFTWARE REPOSITORIES (MSR 2019), 2019, : 63 - 67
  • [2] Bacchelli A, 2018, IEEE T SOFTWARE ENG, P1, DOI DOI 10.1109/TSE.2018
  • [3] Bacchelli A, 2013, PROCEEDINGS OF THE 35TH INTERNATIONAL CONFERENCE ON SOFTWARE ENGINEERING (ICSE 2013), P712, DOI 10.1109/ICSE.2013.6606617
  • [4] Balachandran V, 2013, PROCEEDINGS OF THE 35TH INTERNATIONAL CONFERENCE ON SOFTWARE ENGINEERING (ICSE 2013), P931, DOI 10.1109/ICSE.2013.6606642
  • [5] Fair and Balanced? Bias in Bug-Fix Datasets
    Bird, Christian
    Bachmann, Adrian
    Aune, Eirik
    Duffy, John
    Bernstein, Abraham
    Filkov, Vladimir
    Devanbu, Premkumar
    [J]. 7TH JOINT MEETING OF THE EUROPEAN SOFTWARE ENGINEERING CONFERENCE AND THE ACM SIGSOFT SYMPOSIUM ON THE FOUNDATIONS OF SOFTWARE ENGINEERING, 2009, : 121 - 130
  • [6] Characteristics of Useful Code Reviews: An Empirical Study at Microsoft
    Bosu, Amiangshu
    Greiler, Michaela
    Bird, Christian
    [J]. 12TH WORKING CONFERENCE ON MINING SOFTWARE REPOSITORIES (MSR 2015), 2015, : 146 - 156
  • [7] Systematic Labeling Bias: De-biasing where Everyone is Wrong
    Cabrera, Guillermo F.
    Miller, Christopher J.
    Schneider, Jeff
    [J]. 2014 22ND INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION (ICPR), 2014, : 4417 - 4422
  • [8] A review of code reviewer recommendation studies: Challenges and future directions
    Cetin, H. Alperen
    Dogan, Emre
    Tuzun, Eray
    [J]. SCIENCE OF COMPUTER PROGRAMMING, 2021, 208
  • [9] Chen T, 2014, PROCEEDINGS OF THE FIFTH INTERNATIONAL SYMPOSIUM ON TEST AUTOMATION & INSTRUMENTATION, VOLS 1 AND 2, P82
  • [10] Dogan E, 2019, INT SYMP EMP SOFTWAR, P7