When less is more: on the value of “co-training” for semi-supervised software defect predictors

被引:0
|
作者
Suvodeep Majumder
Joymallya Chakraborty
Tim Menzies
机构
[1] North Carolina State University,Department of Computer Science
来源
Empirical Software Engineering | 2024年 / 29卷
关键词
Semi-supervised learning; SSL; Self-training; Co-training; Boosting methods; Semi-supervised preprocessing; Clustering-based semi-supervised preprocessing; Intrinsically semi-supervised methods; Graph-based methods; Co-forest; Effort aware tri-training;
D O I
暂无
中图分类号
学科分类号
摘要
Labeling a module defective or non-defective is an expensive task. Hence, there are often limits on how much-labeled data is available for training. Semi-supervised classifiers use far fewer labels for training models. However, there are numerous semi-supervised methods, including self-labeling, co-training, maximal-margin, and graph-based methods, to name a few. Only a handful of these methods have been tested in SE for (e.g.) predicting defects– and even there, those methods have been tested on just a handful of projects. This paper applies a wide range of 55 semi-supervised learners to over 714 projects. We find that semi-supervised “co-training methods” work significantly better than other approaches. Specifically, after labeling, just 2.5% of data, then make predictions that are competitive to those using 100% of the data. That said, co-training needs to be used cautiously since the specific choice of co-training methods needs to be carefully selected based on a user’s specific goals. Also, we warn that a commonly-used co-training method (“multi-view”– where different learners get different sets of columns) does not improve predictions (while adding too much to the run time costs 11 hours vs. 1.8 hours). It is an open question, worthy of future work, to test if these reductions can be seen in other areas of software analytics. To assist with exploring other areas, all the codes used are available at https://github.com/ai-se/Semi-Supervised.
引用
收藏
相关论文
共 50 条
  • [21] SEMI-SUPERVISED LEARNING WITH CO-TRAINING FOR DATA-DRIVEN PROGNOSTICS
    Hu, Chao
    Youn, Byeng D.
    Kim, Taejin
    PROCEEDINGS OF THE ASME INTERNATIONAL DESIGN ENGINEERING TECHNICAL CONFERENCES AND COMPUTERS AND INFORMATION IN ENGINEERING CONFERENCE, 2011, VOL 2, PTS A AND B, 2012, : 1297 - 1306
  • [22] A SEMI-SUPERVISED METHOD FOR SAR TARGET DISCRIMINATION BASED ON CO-TRAINING
    Du, Lan
    Wang, Yan
    Xie, Weitong
    2019 IEEE INTERNATIONAL GEOSCIENCE AND REMOTE SENSING SYMPOSIUM (IGARSS 2019), 2019, : 9482 - 9485
  • [23] Inductive Semi-supervised Multi-Label Learning with Co-Training
    Zhan, Wang
    Zhang, Min-Ling
    KDD'17: PROCEEDINGS OF THE 23RD ACM SIGKDD INTERNATIONAL CONFERENCE ON KNOWLEDGE DISCOVERY AND DATA MINING, 2017, : 1305 - 1314
  • [24] Semi-Supervised Learning of Alternatively Spliced Exons Using Co-Training
    Tangirala, Karthik
    Caragea, Doina
    2011 IEEE INTERNATIONAL CONFERENCE ON BIOINFORMATICS AND BIOMEDICINE (BIBM 2011), 2011, : 243 - 246
  • [25] Co-training semi-supervised active learning algorithm with noise filter
    School of Computer Science and Telecommunication Engineering, Jiangsu University, Zhenjiang 212013, China
    Moshi Shibie yu Rengong Zhineng, 2009, 5 (750-755):
  • [26] A Co-training Based Semi-supervised Human Action Recognition Algorithm
    Yuan, Hejin
    Wang, Cuiru
    Liu, Jun
    MANUFACTURING SYSTEMS AND INDUSTRY APPLICATIONS, 2011, 267 : 1065 - 1070
  • [27] Stacked co-training for semi-supervised multi-label learning
    Li, Jiaxuan
    Zhu, Xiaoyan
    Wang, Hongrui
    Zhang, Yu
    Wang, Jiayin
    INFORMATION SCIENCES, 2024, 677
  • [28] Safe Multi-view Co-training for Semi-supervised Regression
    Liu, Li Yan
    Huang, Peng
    Min, Fan
    2022 IEEE 9TH INTERNATIONAL CONFERENCE ON DATA SCIENCE AND ADVANCED ANALYTICS (DSAA), 2022, : 56 - 65
  • [29] Co-training generative adversarial networks for semi-supervised classification method
    Xu, Zhe
    Geng, Jie
    Jiang, Wen
    Zhang, Zhuo
    Zeng, Qing-Jie
    Guangxue Jingmi Gongcheng/Optics and Precision Engineering, 2021, 29 (05): : 1127 - 1135
  • [30] A semi-supervised extreme learning machine method based on co-training
    Li, Kunlun
    Zhang, Juan
    Xu, Hongyu
    Luo, Shangzong
    Li, Hexin
    Journal of Computational Information Systems, 2013, 9 (01): : 207 - 214