Class-Aware Pseudo-Labeling for Non-Random Missing Labels in Semi-Supervised Learning

被引:0
作者
Gui, Qian [1 ]
Wu, Xinting [1 ]
Niu, Baoning [1 ]
机构
[1] Taiyuan Univ Technol, Sch Informat & Comp, Taiyuan, Peoples R China
关键词
Semi-supervised learning; missing label not at random;
D O I
10.1142/S1793351X23640018
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Semi-supervised learning (SSL) is a classic missing label problem. Existing SSL algorithms always rely on the basic assumption, label missing completely at random (MCAR), where both labeled and unlabeled data share the same class distribution. Compared to MCAR, the label missing not at random (MNAR) problem is more realistic. In MNAR, the labeled and unlabeled data have different class distributions resulting in biased label imputation, which leads to the performance degradation of SSL models. Existing SSL algorithms can hardly perform well on tail classes (the classes with few training examples) in MNAR setting, since the pseudo-labels learned from unlabeled data tend to be biased toward head classes (the classes with a large number of training examples). To alleviate this issue, we propose a class-aware pseudo-labeling (CAPL) for non-random missing labels in SSL, which utilizes the unlabeled data by dynamically adjusting the threshold for selecting pseudo-labels. Under various MNAR settings, our method achieves up to 15.0% overall accuracy gain upon FixMatch in CIFAR-10 compared with existing baselines.
引用
收藏
页码:531 / 543
页数:13
相关论文
共 36 条
  • [1] Berthelot D., 2019, P 32 ANN C NEUR INF
  • [2] Berthelot D., 2020, PROC 8 INT C LEARNIN
  • [3] Chapelle Olivier, 2006, CSZ2006. html, P1
  • [4] Human preferences for species conservation: Animal charisma trumps endangered status
    Colleony, Agathe
    Clayton, Susan
    Couvet, Denis
    Saint Jalme, Michel
    Prevot, Anne-Caroline
    [J]. BIOLOGICAL CONSERVATION, 2017, 206 : 263 - 269
  • [5] Randaugment: Practical automated data augmentation with a reduced search space
    Cubuk, Ekin D.
    Zoph, Barret
    Shlens, Jonathon
    Le, Quoc, V
    [J]. 2020 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION WORKSHOPS (CVPRW 2020), 2020, : 3008 - 3017
  • [6] DeVries T., 2017, ARXIV
  • [7] Enders CK., 2010, Applied Missing Data Analysis
  • [8] Grandvalet Y., 2004, Advances in Neural Inf. Process. Syst., V17, P529
  • [9] Guo LZ, 2022, PR MACH LEARN RES
  • [10] Heckman J., 1977, 0172 NBER