Sample-based software defect prediction with active and semi-supervised learning

被引:158
作者
Li, Ming [2 ]
Zhang, Hongyu [1 ]
Wu, Rongxin [1 ]
Zhou, Zhi-Hua [2 ]
机构
[1] Tsinghua Univ, MOE Key Lab Informat Syst Secur, Beijing 100084, Peoples R China
[2] Nanjing Univ, Natl Key Lab Novel Software Technol, Nanjing 210093, Jiangsu, Peoples R China
基金
美国国家科学基金会;
关键词
Software defect prediction; Sampling; Quality assurance; Machine learning; Active semi-supervised learning; STATIC CODE ATTRIBUTES; CLASSIFICATION; FRAMEWORK;
D O I
10.1007/s10515-011-0092-1
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
Software defect prediction can help us better understand and control software quality. Current defect prediction techniques are mainly based on a sufficient amount of historical project data. However, historical data is often not available for new projects and for many organizations. In this case, effective defect prediction is difficult to achieve. To address this problem, we propose sample-based methods for software defect prediction. For a large software system, we can select and test a small percentage of modules, and then build a defect prediction model to predict defect-proneness of the rest of the modules. In this paper, we describe three methods for selecting a sample: random sampling with conventional machine learners, random sampling with a semi-supervised learner and active sampling with active semi-supervised learner. To facilitate the active sampling, we propose a novel active semi-supervised learning method ACoForest which is able to sample the modules that are most helpful for learning a good prediction model. Our experiments on PROMISE datasets show that the proposed methods are effective and have potential to be applied to industrial practice.
引用
收藏
页码:201 / 230
页数:30
相关论文
共 49 条
  • [1] Angluin D., 1988, Machine Learning, V2, P343, DOI 10.1023/A:1022873112823
  • [2] [Anonymous], 2006, BOOK REV IEEE T NEUR
  • [3] [Anonymous], 1530 U WISC DEP COMP
  • [4] [Anonymous], P 12 INT C MACH LEAR
  • [5] [Anonymous], 2003, P 20 INT C MACH LEAR
  • [6] [Anonymous], 2004, ADV NEURAL INFORM PR
  • [7] [Anonymous], 2009, P 6 C EMAIL ANTI SPA
  • [8] [Anonymous], P INT C PRED MOD SOF
  • [9] Margin based active learning
    Balcan, Maria-Florina
    Broder, Andrei
    Zhang, Tong
    [J]. LEARNING THEORY, PROCEEDINGS, 2007, 4539 : 35 - +
  • [10] Belkin M, 2006, J MACH LEARN RES, V7, P2399