Learning to Predict Code Review Completion Time In Modern Code Review

被引:0
作者
Moataz Chouchen
Ali Ouni
Jefferson Olongo
Mohamed Wiem Mkaouer
机构
[1] University of Quebec,ETS Montreal
[2] Rochester Institute of Technology,undefined
来源
Empirical Software Engineering | 2023年 / 28卷
关键词
Modern Code Review; Code review completion time estimation; Machine Learning; Software engineering.;
D O I
暂无
中图分类号
学科分类号
摘要
Modern Code Review (MCR) is being adopted in both open-source and proprietary projects as a common practice. MCR is a widely acknowledged quality assurance practice that allows early detection of defects as well as poor coding practices. It also brings several other benefits such as knowledge sharing, team awareness, and collaboration. For a successful review process, peer reviewers should perform their review tasks promptly while providing relevant feedback about the code change being reviewed. However, in practice, code reviews can experience significant delays to be completed due to various socio-technical factors which can affect the project quality and cost. That is, existing MCR frameworks lack tool support to help developers estimate the time required to complete a code review before accepting or declining a review request. In this paper, we aim to build and validate an automated approach to predict the code review completion time in the context of MCR. We believe that the predictions of our approach can improve the engagement of developers by raising their awareness regarding potential delays while doing code reviews. To this end, we formulate the prediction of the code review completion time as a learning problem. In particular, we propose a framework based on regression machine learning (ML) models based on 69 features that stem from 8 dimensions to (i) effectively estimate the code review completion time, and (ii) investigate the main factors influencing code review completion time. We conduct an empirical study on more than 280K code reviews spanning over five projects hosted on Gerrit. Results indicate that ML models significantly outperform baseline approaches with a relative improvement ranging from 7% to 49%. Furthermore, our experiments show that features related to the date of the code review request, the previous owner and reviewers’ activities as well as the history of their interactions are the most important features. Our approach can help further engage the change owner and reviewers by raising their awareness regarding potential delays based on the predicted code review completion time.
引用
收藏
相关论文
共 123 条
  • [1] Abdi H(2010)Principal component analysis Wiley interdisciplinary reviews: computational statistics 2 433-459
  • [2] Williams LJ(2016)Investigating technical and non-technical factors influencing modern code review Empirical Software Engineering 21 932-959
  • [3] Baysal O(2015)Towards improving statistical modeling of software engineering data: think locally, act globally! Empirical Software Engineering 20 294-335
  • [4] Kononenko O(2000)Exploring the relationships between design measures and software quality in object-oriented systems Journal of systems and software 51 245-273
  • [5] Holmes R(2018)A deep learning model for estimating story points IEEE Transactions on Software Engineering 45 637-656
  • [6] Godfrey MW(2021)Whoreview: A multi-objective search-based approach for code reviewers recommendation in modern code review Applied Soft Computing 100 573-583
  • [7] Bettenburg N(1999)Bayesian analysis of empirical software engineering cost models IEEE Transactions on Software Engineering 25 375-397
  • [8] Nagappan M(2011)Data mining techniques for software effort estimation: a comparative study IEEE transactions on software engineering 38 1-48
  • [9] Hassan AE(2022)Towards a taxonomy of code review smells Information and Software Technology 142 3346-3393
  • [10] Briand LC(2021)An exploratory study on confusion in code reviews Empirical Software Engineering 26 3-42