Identifying bot activity in GitHub pull request and issue comments

被引:10
|
作者
Golzadeh, Mehdi [1 ]
Decan, Alexandre [1 ]
Constantinou, Eleni [2 ]
Mens, Tom [1 ]
机构
[1] Univ Mons, Software Engn Lab, Mons, Belgium
[2] Eindhoven Univ Technol, Eindhoven, Netherlands
来源
2021 IEEE/ACM THIRD INTERNATIONAL WORKSHOP ON BOTS IN SOFTWARE ENGINEERING (BOTSE 2021) | 2021年
关键词
GitHub; automated comments; distributed software development; classification model; empirical analysis;
D O I
10.1109/BotSE52550.2021.00012
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
Development bots are used on Github to automate repetitive activities. Such bots communicate with human actors via issue comments and pull request comments. Identifying such bot comments allows to prevent bias in socio-technical studies related to software development. To automate their identification, we propose a classification model based on natural language processing. Starting from a balanced ground-truth dataset of 19,282 PR and issue comments, we encode the comments as vectors using a combination of the bag of words and TF-IDF techniques. We train a range of binary classifiers to predict the type of comment (human or bot) based on this vector representation. A multinomial Naive Bayes classifier provides the best results. Its performance on a test set containing 50% of the data achieves an average precision, recall, and F-1 score of 0.88. Although the model shows a promising result on the pull request and issue comments, further work is required to generalize the model on other types of activities, like commit messages and code reviews.
引用
收藏
页码:21 / 25
页数:5
相关论文
共 7 条
  • [1] To Follow or Not to Follow: Understanding Issue/Pull-Request Templates on GitHub
    Li, Zhixing
    Yu, Yue
    Wang, Tao
    Lei, Yan
    Wang, Ying
    Wang, Huaimin
    IEEE TRANSACTIONS ON SOFTWARE ENGINEERING, 2023, 49 (04) : 2530 - 2544
  • [2] Consistent or not? An investigation of using Pull Request Template in GitHub
    Zhang, Mengxi
    Liu, Huaxiao
    Chen, Chunyang
    Liu, Yuzhou
    Bai, Shuotong
    INFORMATION AND SOFTWARE TECHNOLOGY, 2022, 144
  • [3] An Exploratory Study of Reactions to Bot Comments on GitHub
    Farah, Juan Carlos
    Spaenlehauer, Basile
    Lu, Xinyang
    Ingram, Sandy
    Gillet, Denis
    2022 IEEE/ACM 4TH INTERNATIONAL WORKSHOP ON BOTS IN SOFTWARE ENGINEERING (BOTSE 2022), 2022, : 18 - 22
  • [4] A Comparative Study of the Effects of Pull Request on GitHub Projects
    Liu, Jing
    Li, Jiahao
    He, Lulu
    PROCEEDINGS 2016 IEEE 40TH ANNUAL COMPUTER SOFTWARE AND APPLICATIONS CONFERENCE WORKSHOPS, VOL 1, 2016, : 313 - 322
  • [5] A bot identification model and tool based on GitHub activity sequences☆
    Chidambaram, Natarajan
    Decan, Alexandre
    Mens, Tom
    JOURNAL OF SYSTEMS AND SOFTWARE, 2025, 221
  • [6] RABBIT: A tool for identifying bot accounts based on their recent GitHub event history
    Chidambaram, Natarajan
    Mens, Tom
    Decan, Alexandre
    2024 IEEE/ACM 21ST INTERNATIONAL CONFERENCE ON MINING SOFTWARE REPOSITORIES, MSR, 2024, : 687 - 691
  • [7] A ground-truth dataset and classification model for detecting bots in GitHub issue and PR comments
    Golzadeh, Mehdi
    Decan, Alexandre
    Legay, Damien
    Mens, Tom
    JOURNAL OF SYSTEMS AND SOFTWARE, 2021, 175