Curated Email-Based Code Reviews Datasets

被引:0
作者
Liang, Mingzhao [1 ]
Charoenwet, Wachiraphan [1 ]
Thongtanunam, Patanamon [1 ]
机构
[1] Univ Melbourne, Melbourne, Vic, Australia
来源
2024 IEEE/ACM 21ST INTERNATIONAL CONFERENCE ON MINING SOFTWARE REPOSITORIES, MSR | 2024年
基金
澳大利亚研究理事会;
关键词
PARTICIPATION;
D O I
10.1145/3643991.3644872
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Code review is an important practice that improves the overall quality of a proposed patch (i.e. code changes). While much research focused on tool-based code reviews (e.g. a Gerrit code review tool, GitHub), many traditional open-source software (OSS) projects still conduct code reviews through emails. However, due to the nature of unstructured email-based data, it can be challenging to mine email-based code reviews, hindering researchers from delving into the code review practice of such long-standing OSS projects. Therefore, this paper presents large-scale datasets of email-based code reviews of 167 projects across three OSS communities (i.e. Linux Kernel, OzLabs, and FFmpeg). We mined the data from Patchwork, a web-based patch-tracking system for email-based code review, and curated the data by grouping a submitted patch and its revised versions and grouping email aliases. Our datasets include a total of 4.2M patches with 2.1M patch groups and 169K email addresses belonging to 141K individuals. Our published artefacts include the datasets as well as a tool suite to crawl, curate, and store Patchwork data. With our datasets, future work can directly delve into an email-based code review practice of large OSS projects without additional effort in data collection and curation.
引用
收藏
页码:294 / 298
页数:5
相关论文
共 37 条
  • [1] [Anonymous], 2006, INT C SOFTWARE ENG P, DOI DOI 10.1145/1137983.1138016
  • [2] Bacchelli A, 2013, PROCEEDINGS OF THE 35TH INTERNATIONAL CONFERENCE ON SOFTWARE ENGINEERING (ICSE 2013), P712, DOI 10.1109/ICSE.2013.6606617
  • [3] Code reviews enhance software quality
    Baker, RA
    [J]. PROCEEDINGS OF THE 1997 INTERNATIONAL CONFERENCE ON SOFTWARE ENGINEERING, 1997, : 570 - 571
  • [4] Bird C., 2008, P 16 ACM SIGSOFT INT, P24, DOI DOI 10.1145/1453101.1453107
  • [5] Knowledge Transfer in Modern Code Review
    Caulo, Maria
    Lin, Bin
    Bavota, Gabriele
    Scanniello, Giuseppe
    Lanza, Michele
    [J]. 2020 IEEE/ACM 28TH INTERNATIONAL CONFERENCE ON PROGRAM COMPREHENSION, ICPC, 2020, : 230 - 240
  • [6] A COEFFICIENT OF AGREEMENT FOR NOMINAL SCALES
    COHEN, J
    [J]. EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT, 1960, 20 (01) : 37 - 46
  • [7] Django Rest Framwork, ABOUT US
  • [8] djangoproject, Django Framwork
  • [9] The shut the f∗∗k up" Phenomenon: Characterizing Incivility in Open Source Code Review Discussions"
    Ferreira I.
    Cheng J.
    Adams B.
    [J]. Proceedings of the ACM on Human-Computer Interaction, 2021, 5 (CSCW2)
  • [10] ffmpeg, FFmpeg Documents