PyTraceBugs: A Large Python']Python Code Dataset for Supervised Machine Learning in Software Defect Prediction

被引:7
作者
Akimova, Elena N. [1 ,2 ]
Bersenev, Alexander Yu [1 ,2 ]
Deikov, Artem A. [1 ,2 ]
Kobylkin, Konstantin S. [1 ,2 ]
Konygin, Anton, V [1 ]
Mezentsev, Ilya P. [1 ,2 ]
Misilov, Vladimir E. [1 ,2 ]
机构
[1] UB RAS, Krasovskii Inst Math & Mech, S Kovalevskaya St 16, Ekaterinburg 620108, Russia
[2] Ural Fed Univ, Mira St 19, Ekaterinburg 620002, Russia
来源
2021 28TH ASIA-PACIFIC SOFTWARE ENGINEERING CONFERENCE (APSEC 2021) | 2021年
关键词
defect prediction; bug dataset; data mining;
D O I
10.1109/APSEC53868.2021.00022
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
Contemporary software engineering tools employ deep learning methods to identify bugs and defects in source code. Being data-hungry, supervised deep neural network models require large labeled datasets for their robust and accurate training. In distinction to, say, Java, there is lack of such datasets for Python. Most of the known datasets containing the labeled Python source code are of relatively small size. Those datasets are suitable for testing built deep learning models, but not for their training. Therefore, larger labeled datasets have to be created based on some well-received algorithmic principles to select relevant source code from the available public codebases. In this work, a large dataset of the labeled Python source code is created named PyTraceBugs. It is intended for training, validating, and evaluating large deep learning models to identify a special class of low-level bugs in source code snippets manifested by throwing error exceptions, reported in standard traceback messages. Here, a code snippet is assumed to be either a function or a method implementation. The dataset contains 5.7 million correct source code snippets and 24 thousands buggy snippets from the Github public repositories. Most represented bugs are: absence of attribute, empty object, index out of range, and text encoding/decoding errors. The dataset is split into training, validation and test samples. Confidence in labeling of the snippets into buggy and correct is about 85% according to our estimates. Labeling of the snippets in the test sample is additionally manually validated to be almost 100% confident. To demonstrate advantages of our dataset, it is used to train a binary classification model for distinguishing the buggy and correct source code. This model employs the pretrained BERT-like contextual embeddings. Its performances are as follows: precision on the test set is 96% for the buggy source code and 61% for the correct source code whereas recall is 34% and 99% respectively. The model performance is also estimated on the known BugsInPy dataset: here, it reports approximately 14% of buggy snippets.Y
引用
收藏
页码:141 / 151
页数:11
相关论文
共 22 条
  • [1] A Survey on Software Defect Prediction Using Deep Learning
    Akimova, Elena N.
    Bersenev, Alexander Yu
    Deikov, Artem A.
    Kobylkin, Konstantin S.
    Konygin, Anton, V
    Mezentsev, Ilya P.
    Misilov, Vladimir E.
    [J]. MATHEMATICS, 2021, 9 (11)
  • [2] Allamanis M., 2021, ADV NEURAL INFORM PR
  • [3] Chen Z., 2018, The codrep machine learning on source code competition
  • [4] SequenceR: Sequence-to-Sequence Learning for End-to-End Program Repair
    Chen, Zimin
    Kommrusch, Steve
    Tufano, Michele
    Pouchet, Louis-Noel
    Poshyvanyk, Denys
    Monperrus, Martin
    [J]. IEEE TRANSACTIONS ON SOFTWARE ENGINEERING, 2021, 47 (09) : 1943 - 1959
  • [5] Feng Zhangyin, 2020, FINDINGS ASS COMPUTA
  • [6] An automatically created novel bug dataset and its validation in bug prediction
    Ferenc, Rudolf
    Gyimesi, Peter
    Gyimesi, Gabor
    Toth, Zoltan
    Gyimothy, Tibor
    [J]. JOURNAL OF SYSTEMS AND SOFTWARE, 2020, 169
  • [7] Herbold S., 2020, ISSUES SZZ EMPIRICAL
  • [8] Just R., 2014, ISSTA 2014, P437, DOI [10.1145/2610384.2628055, DOI 10.1145/2610384.2628055]
  • [9] How Often Do Single-Statement Bugs Occur? The ManySStuBs4J Dataset
    Karampatsis, Rafael-Michael
    Sutton, Charles
    [J]. 2020 IEEE/ACM 17TH INTERNATIONAL CONFERENCE ON MINING SOFTWARE REPOSITORIES, MSR, 2020, : 573 - 577
  • [10] The ManyBugs and IntroClass Benchmarks for Automated Repair of C Programs
    Le Goues, Claire
    Holtschulte, Neal
    Smith, Edward K.
    Brun, Yuriy
    Devanbu, Premkumar
    Forrest, Stephanie
    Weimer, Westley
    [J]. IEEE TRANSACTIONS ON SOFTWARE ENGINEERING, 2015, 41 (12) : 1236 - 1256