PyTraceBugs: A Large Python']Python Code Dataset for Supervised Machine Learning in Software Defect Prediction

被引：7

作者：

Akimova, Elena N. ^{[1
,2
]}

Bersenev, Alexander Yu ^{[1
,2
]}

Deikov, Artem A. ^{[1
,2
]}

Kobylkin, Konstantin S. ^{[1
,2
]}

Konygin, Anton, V ^{[1
]}

Mezentsev, Ilya P. ^{[1
,2
]}

Misilov, Vladimir E. ^{[1
,2
]}

机构：

[1] UB RAS, Krasovskii Inst Math & Mech, S Kovalevskaya St 16, Ekaterinburg 620108, Russia

[2] Ural Fed Univ, Mira St 19, Ekaterinburg 620002, Russia

来源：

2021 28TH ASIA-PACIFIC SOFTWARE ENGINEERING CONFERENCE (APSEC 2021) | 2021年

关键词：

defect prediction; bug dataset; data mining;

D O I：

10.1109/APSEC53868.2021.00022

中图分类号：

TP31 [计算机软件];

学科分类号：

081202 ; 0835 ;

摘要：

Contemporary software engineering tools employ deep learning methods to identify bugs and defects in source code. Being data-hungry, supervised deep neural network models require large labeled datasets for their robust and accurate training. In distinction to, say, Java, there is lack of such datasets for Python. Most of the known datasets containing the labeled Python source code are of relatively small size. Those datasets are suitable for testing built deep learning models, but not for their training. Therefore, larger labeled datasets have to be created based on some well-received algorithmic principles to select relevant source code from the available public codebases. In this work, a large dataset of the labeled Python source code is created named PyTraceBugs. It is intended for training, validating, and evaluating large deep learning models to identify a special class of low-level bugs in source code snippets manifested by throwing error exceptions, reported in standard traceback messages. Here, a code snippet is assumed to be either a function or a method implementation. The dataset contains 5.7 million correct source code snippets and 24 thousands buggy snippets from the Github public repositories. Most represented bugs are: absence of attribute, empty object, index out of range, and text encoding/decoding errors. The dataset is split into training, validation and test samples. Confidence in labeling of the snippets into buggy and correct is about 85% according to our estimates. Labeling of the snippets in the test sample is additionally manually validated to be almost 100% confident. To demonstrate advantages of our dataset, it is used to train a binary classification model for distinguishing the buggy and correct source code. This model employs the pretrained BERT-like contextual embeddings. Its performances are as follows: precision on the test set is 96% for the buggy source code and 61% for the correct source code whereas recall is 34% and 99% respectively. The model performance is also estimated on the known BugsInPy dataset: here, it reports approximately 14% of buggy snippets.Y

引用

页码：141 / 151

页数：11

共 22 条

[1] A Survey on Software Defect Prediction Using Deep Learning
Akimova, Elena N.
Bersenev, Alexander Yu
Deikov, Artem A.
Kobylkin, Konstantin S.
Konygin, Anton, V
Mezentsev, Ilya P.
Misilov, Vladimir E.
[J]. MATHEMATICS, 2021, 9 (11)
[2] Allamanis M., 2021, ADV NEURAL INFORM PR
[3] Chen Z., 2018, The codrep machine learning on source code competition
[4] SequenceR: Sequence-to-Sequence Learning for End-to-End Program Repair
Chen, Zimin
Kommrusch, Steve
Tufano, Michele
Pouchet, Louis-Noel
Poshyvanyk, Denys
Monperrus, Martin
[J]. IEEE TRANSACTIONS ON SOFTWARE ENGINEERING, 2021, 47 (09) : 1943 - 1959
[5] Feng Zhangyin, 2020, FINDINGS ASS COMPUTA
[6] An automatically created novel bug dataset and its validation in bug prediction
Ferenc, Rudolf
Gyimesi, Peter
Gyimesi, Gabor
Toth, Zoltan
Gyimothy, Tibor
[J]. JOURNAL OF SYSTEMS AND SOFTWARE, 2020, 169
[7] Herbold S., 2020, ISSUES SZZ EMPIRICAL
[8] Just R., 2014, ISSTA 2014, P437, DOI [10.1145/2610384.2628055, DOI 10.1145/2610384.2628055]
[9] How Often Do Single-Statement Bugs Occur? The ManySStuBs4J Dataset
Karampatsis, Rafael-Michael
Sutton, Charles
[J]. 2020 IEEE/ACM 17TH INTERNATIONAL CONFERENCE ON MINING SOFTWARE REPOSITORIES, MSR, 2020, : 573 - 577
[10] The ManyBugs and IntroClass Benchmarks for Automated Repair of C Programs
Le Goues, Claire
Holtschulte, Neal
Smith, Edward K.
Brun, Yuriy
Devanbu, Premkumar
Forrest, Stephanie
Weimer, Westley
[J]. IEEE TRANSACTIONS ON SOFTWARE ENGINEERING, 2015, 41 (12) : 1236 - 1256

← 1 2 3 →