When less is more: on the value of “co-training” for semi-supervised software defect predictors

被引:0
作者
Suvodeep Majumder
Joymallya Chakraborty
Tim Menzies
机构
[1] North Carolina State University,Department of Computer Science
来源
Empirical Software Engineering | 2024年 / 29卷
关键词
Semi-supervised learning; SSL; Self-training; Co-training; Boosting methods; Semi-supervised preprocessing; Clustering-based semi-supervised preprocessing; Intrinsically semi-supervised methods; Graph-based methods; Co-forest; Effort aware tri-training;
D O I
暂无
中图分类号
学科分类号
摘要
Labeling a module defective or non-defective is an expensive task. Hence, there are often limits on how much-labeled data is available for training. Semi-supervised classifiers use far fewer labels for training models. However, there are numerous semi-supervised methods, including self-labeling, co-training, maximal-margin, and graph-based methods, to name a few. Only a handful of these methods have been tested in SE for (e.g.) predicting defects– and even there, those methods have been tested on just a handful of projects. This paper applies a wide range of 55 semi-supervised learners to over 714 projects. We find that semi-supervised “co-training methods” work significantly better than other approaches. Specifically, after labeling, just 2.5% of data, then make predictions that are competitive to those using 100% of the data. That said, co-training needs to be used cautiously since the specific choice of co-training methods needs to be carefully selected based on a user’s specific goals. Also, we warn that a commonly-used co-training method (“multi-view”– where different learners get different sets of columns) does not improve predictions (while adding too much to the run time costs 11 hours vs. 1.8 hours). It is an open question, worthy of future work, to test if these reductions can be seen in other areas of software analytics. To assist with exploring other areas, all the codes used are available at https://github.com/ai-se/Semi-Supervised.
引用
收藏
相关论文
共 50 条
  • [41] A Semi-Supervised Approach to Software Defect Prediction
    Lu, Huihua
    Cukic, Bojan
    Culp, Mark
    2014 IEEE 38TH ANNUAL INTERNATIONAL COMPUTERS, SOFTWARE AND APPLICATIONS CONFERENCE (COMPSAC), 2014, : 416 - 425
  • [42] RFID indoor positioning based on semi-supervised actor-critic co-training
    Li L.
    Jiali Z.
    Yixuan Q.
    Zihan L.
    Yingchao L.
    Tianxing H.
    Journal of China Universities of Posts and Telecommunications, 2020, 27 (05): : 69 - 81
  • [43] Co-training Semi-supervised Learning for Single-Target Regression in Data Streams Using AMRules
    Sousa, Ricardo
    Gama, Joao
    FOUNDATIONS OF INTELLIGENT SYSTEMS, ISMIS 2017, 2017, 10352 : 499 - 508
  • [44] Semi-Supervised Co-Training Model Using Convolution and Transformer for Hyperspectral Image Classification
    Zhao, Feng
    Song, Xiqun
    Zhang, Junjie
    Liu, Hanqiang
    IEEE GEOSCIENCE AND REMOTE SENSING LETTERS, 2024, 21
  • [45] HIGH ACCURATE INTERNET TRAFFIC CLASSIFICATION BASED ON CO-TRAINING SEMI-SUPERVISED CLUSTERING
    Li, Xiang
    Qi, Feng
    Yu, Li Kun
    Qiu, Xue Song
    PROCEEDINGS OF THE 2010 INTERNATIONAL CONFERENCE ON ADVANCED INTELLIGENCE AND AWARENESS INTERNET, AIAI2010, 2010, : 193 - 197
  • [46] Research on semi-supervised heterogeneous adaptive co-training soft-sensor model
    Li D.
    Huang D.
    Liu Y.
    Huagong Xuebao/CIESC Journal, 2020, 71 (05): : 2128 - 2138
  • [47] Co-training partial least squares model for semi-supervised soft sensor development
    Bao, Liang
    Yuan, Xiaofeng
    Ge, Zhiqiang
    CHEMOMETRICS AND INTELLIGENT LABORATORY SYSTEMS, 2015, 147 : 75 - 85
  • [48] Semi-Supervised Learning Approach for Indonesian Named Entity Recognition (NER) Using Co-Training Algorithm
    Aryoyudanta, Bayu
    Adji, Teguh Bharata
    Llidayah, Lndriana
    2016 INTERNATIONAL SEMINAR ON INTELLIGENT TECHNOLOGY AND ITS APPLICATIONS (ISITIA): RECENT TRENDS IN INTELLIGENT COMPUTATIONAL TECHNOLOGIES FOR SUSTAINABLE ENERGY, 2016, : 7 - 11
  • [49] Multi-head co-training: An uncertainty-aware and robust semi-supervised learning framework
    Chen, Mingcai
    Wang, Chongjun
    KNOWLEDGE-BASED SYSTEMS, 2024, 302
  • [50] Semi-supervised Software Defect Prediction Model Based on Tri-training
    Meng, Fanqi
    Cheng, Wenying
    Wang, Jingdong
    KSII TRANSACTIONS ON INTERNET AND INFORMATION SYSTEMS, 2021, 15 (11): : 4028 - 4042