Human, bot or both? A study on the capabilities of classification models on mixed accounts

被引：5

作者：

Cassee, Nathan ^{[1
]}

Kitsanelis, Christos ^{[1
]}

Constantinou, Eleni ^{[1
]}

Serebrenik, Alexander ^{[1
]}

机构：

[1] Eindhoven Univ Technol, Eindhoven, Netherlands

来源：

2021 IEEE INTERNATIONAL CONFERENCE ON SOFTWARE MAINTENANCE AND EVOLUTION (ICSME 2021) | 2021年

关键词：

bot identification; classification model; social coding platforms; GitHub; software engineering;

D O I：

10.1109/ICSME52107.2021.00075

中图分类号：

TP31 [计算机软件];

学科分类号：

081202 ; 0835 ;

摘要：

Several bot detection algorithms have recently been discussed in the literature, as software bots that perform maintenance tasks have become more popular in recent years. State-of-the-art techniques detect bots based on a binary classification, where a GitHub account is either a human or a bot. However, this conceptualisation of bot detection as an account-level binary classification problem fails to account for 'mixed accounts', accounts that are shared between a human and a bot, and that therefore exhibit both bot and human activity. By using binary classification models for bot detection, researchers might hence mischaracterize both human and bot behavior in software maintenance. This calls for conceptualisation of bot detection through a comment-level classification. However, the single such approach solely investigates a small number of mixed account comments. The nature of mixed accounts on GitHub is thus yet unknown, and the absence of appropriate datasets make this a difficult problem to study. In this paper, we investigate three comment-level classification models and we evaluate these classifiers on a manually labeled dataset of mixed accounts. We find that the best classifiers based on these classification models achieve a precision and recall between 88% and 96%. However, even the most accurate comment-level classifier cannot accurately detect mixed accounts; rather, we find that textual content alone, or textual content combined with templates used by bots, are very effective features for the detection of both bot and mixed accounts. Our study calls for more accurate bot detection techniques capable of identifying mixed accounts, and as such supporting more refined insights in software maintenance activities performed by humans and bots on social coding sites.

引用

页码：654 / 658

页数：5

共 5 条

[1] Research on Mixed and Classification Simulation Models of Medical Waste-A Case Study in Beijing, China
Liu, Hao
Yao, Zhong
SUSTAINABILITY, 2018, 10 (11):
[2] Detecting Bot on GitHub Leveraging Transformer-based Models: A Preliminary Study
Zhang, Jin
Wu, Xingjin
Zhang, Yang
Xu, Shunyu
PROCEEDINGS OF THE 2023 30TH ASIA-PACIFIC SOFTWARE ENGINEERING CONFERENCE, APSEC 2023, 2023, : 639 - 640
[3] The use of discriminant analysis, logistic regression and classification tree analysis in the development of classification models for human health effects
Worth, AP
Cronin, MTD
JOURNAL OF MOLECULAR STRUCTURE-THEOCHEM, 2003, 622 (1-2): : 97 - 111
[4] Study of the Applicability Domain of the QSAR Classification Models by Means of the Rivality and Modelability Indexes
Luque Ruiz, Irene
Angel Gomez-Nieto, Miguel
MOLECULES, 2018, 23 (11):
[5] Exploration of structural requirements for azole chemicals towards human aromatase CYP19A1 activity: Classification modeling, structure-activity relationships and read-across study
Alfonso, Ana Y. Caballero
Mora Lagares, Liadys
Novic, Marjana
Benfenati, Emilio
Kumar, Anil
Chayawan
TOXICOLOGY IN VITRO, 2022, 81

← 1 →