Within-Project Software Aging Defect Prediction Based on Active Learning

被引:4
作者
Liang, Mengting [1 ]
Li, Dimeng [1 ]
Xu, Bin [1 ]
Zhao, Dongdong [1 ]
Yu, Xiao [1 ]
Xiang, Jianwen [1 ]
机构
[1] Wuhan Univ Technol, Sch Comp & Artificial Intelligence, Wuhan, Peoples R China
来源
2021 IEEE INTERNATIONAL SYMPOSIUM ON SOFTWARE RELIABILITY ENGINEERING WORKSHOPS (ISSREW 2021) | 2021年
关键词
software aging; aging-related bugs prediction; active learning; hashing-based undersampling ensemble;
D O I
10.1109/ISSREW53611.2021.00037
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
Long-running software systems tend to exhibit performance degradation and increase failure rate, and the phenomenon is known as software aging. The bugs that cause the aging phenomenon are called Aging-Related Bugs (ARBs), and may bring serious economic loss or even endanger human security. To discover and remove ARBs, ARBs prediction is presented. But ARBs prediction model often needs a large number of training data in order to train a high performance classification model. In practice, the labeled data are rare in many cases. In addition, it is difficult to label all samples manually. Furthermore, there is a serious class imbalance problem in ARBs datasets. In order to address the two problems, we propose a framework named QUIRE-HUE. On the one hand, we use a approach named Active Learning by Querying Informative and Representative Examples (QUIRE) to select a few informative and representative samples to label for training set, which can reduce the cost of labeling and get a high performance classification model. On the other hand, we apply a Hashing-Based Undersampling Ensemble (HUE) by constructing diversified training subspaces for undersampling to alleviate class imbalance problem. A set of experiments are performed on two large open-source projects (MySQL, Linux) with six different machine learning classifiers. We use Balance and AUC as the evaluation metrics. Experimental results indicate that QUIRE-HUE achieves encouraging results. Average AUC and Balance are 0.769 and 0.812 respectively on MySQL dataset, 0.772 and 0.828 respectively on Linux dataset, which significantly outperforms all baseline methods.
引用
收藏
页码:1 / 8
页数:8
相关论文
共 47 条
  • [1] Discriminating features-based cost-sensitive approach for software defect prediction
    Ali, Aftab
    Khan, Naveed
    Abu-Tair, Mamun
    Noppen, Joost
    McClean, Sally
    McChesney, Ian
    [J]. AUTOMATED SOFTWARE ENGINEERING, 2021, 28 (02)
  • [2] A systematic literature review of machine learning techniques for software maintainability prediction
    Alsolai, Hadeel
    Roper, Marc
    [J]. INFORMATION AND SOFTWARE TECHNOLOGY, 2020, 119
  • [3] [Anonymous], 2011, CORR
  • [4] Machine learning techniques for code smell detection: A systematic literature review and meta-analysis
    Azeem, Muhammad Ilyas
    Palomba, Fabio
    Shi, Lin
    Wang, Qing
    [J]. INFORMATION AND SOFTWARE TECHNOLOGY, 2019, 108 : 115 - 138
  • [5] Margin based active learning
    Balcan, Maria-Florina
    Broder, Andrei
    Zhang, Tong
    [J]. LEARNING THEORY, PROCEEDINGS, 2007, 4539 : 35 - +
  • [6] On the relative value of data resampling approaches for software defect prediction
    Bennin, Kwabena Ebo
    Keung, Jacky W.
    Monden, Akito
    [J]. EMPIRICAL SOFTWARE ENGINEERING, 2019, 24 (02) : 602 - 636
  • [7] The usefulness of software metric thresholds for detection of bad smells and fault prediction
    Bigonha, Mariza A. S.
    Ferreira, Kecia
    Souza, Priscila
    Sousa, Bruno
    Januario, Marcela
    Lima, Daniele
    [J]. INFORMATION AND SOFTWARE TECHNOLOGY, 2019, 115 : 79 - 92
  • [8] Software defect number prediction: Unsupervised vs supervised methods
    Chen, Xiang
    Zhang, Dun
    Zhao, Yingquan
    Cui, Zhanqi
    Ni, Chao
    [J]. INFORMATION AND SOFTWARE TECHNOLOGY, 2019, 106 : 161 - 181
  • [9] Cotroneo Domenico, 2010, Proceedings of the 2010 IEEE 21st International Symposium on Software Reliability Engineering (ISSRE 2010), P71, DOI 10.1109/ISSRE.2010.24
  • [10] Cotroneo D., 2010 IEEE 2 INT WORK, V2010, P1