Human, bot or both? A study on the capabilities of classification models on mixed accounts

被引:5
|
作者
Cassee, Nathan [1 ]
Kitsanelis, Christos [1 ]
Constantinou, Eleni [1 ]
Serebrenik, Alexander [1 ]
机构
[1] Eindhoven Univ Technol, Eindhoven, Netherlands
来源
2021 IEEE INTERNATIONAL CONFERENCE ON SOFTWARE MAINTENANCE AND EVOLUTION (ICSME 2021) | 2021年
关键词
bot identification; classification model; social coding platforms; GitHub; software engineering;
D O I
10.1109/ICSME52107.2021.00075
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
Several bot detection algorithms have recently been discussed in the literature, as software bots that perform maintenance tasks have become more popular in recent years. State-of-the-art techniques detect bots based on a binary classification, where a GitHub account is either a human or a bot. However, this conceptualisation of bot detection as an account-level binary classification problem fails to account for 'mixed accounts', accounts that are shared between a human and a bot, and that therefore exhibit both bot and human activity. By using binary classification models for bot detection, researchers might hence mischaracterize both human and bot behavior in software maintenance. This calls for conceptualisation of bot detection through a comment-level classification. However, the single such approach solely investigates a small number of mixed account comments. The nature of mixed accounts on GitHub is thus yet unknown, and the absence of appropriate datasets make this a difficult problem to study. In this paper, we investigate three comment-level classification models and we evaluate these classifiers on a manually labeled dataset of mixed accounts. We find that the best classifiers based on these classification models achieve a precision and recall between 88% and 96%. However, even the most accurate comment-level classifier cannot accurately detect mixed accounts; rather, we find that textual content alone, or textual content combined with templates used by bots, are very effective features for the detection of both bot and mixed accounts. Our study calls for more accurate bot detection techniques capable of identifying mixed accounts, and as such supporting more refined insights in software maintenance activities performed by humans and bots on social coding sites.
引用
收藏
页码:654 / 658
页数:5
相关论文
共 5 条
  • [1] Research on Mixed and Classification Simulation Models of Medical Waste-A Case Study in Beijing, China
    Liu, Hao
    Yao, Zhong
    SUSTAINABILITY, 2018, 10 (11):
  • [2] Detecting Bot on GitHub Leveraging Transformer-based Models: A Preliminary Study
    Zhang, Jin
    Wu, Xingjin
    Zhang, Yang
    Xu, Shunyu
    PROCEEDINGS OF THE 2023 30TH ASIA-PACIFIC SOFTWARE ENGINEERING CONFERENCE, APSEC 2023, 2023, : 639 - 640
  • [3] The use of discriminant analysis, logistic regression and classification tree analysis in the development of classification models for human health effects
    Worth, AP
    Cronin, MTD
    JOURNAL OF MOLECULAR STRUCTURE-THEOCHEM, 2003, 622 (1-2): : 97 - 111
  • [4] Study of the Applicability Domain of the QSAR Classification Models by Means of the Rivality and Modelability Indexes
    Luque Ruiz, Irene
    Angel Gomez-Nieto, Miguel
    MOLECULES, 2018, 23 (11):
  • [5] Exploration of structural requirements for azole chemicals towards human aromatase CYP19A1 activity: Classification modeling, structure-activity relationships and read-across study
    Alfonso, Ana Y. Caballero
    Mora Lagares, Liadys
    Novic, Marjana
    Benfenati, Emilio
    Kumar, Anil
    Chayawan
    TOXICOLOGY IN VITRO, 2022, 81