CoLAL: Co-learning Active Learning for Text Classification

被引:0
作者
Le, Linh [1 ]
Zhao, Genghong [2 ]
Zhang, Xia [3 ]
Zuccon, Guido [1 ]
Demartini, Gianluca [1 ]
机构
[1] Univ Queensland, St Lucia, Qld, Australia
[2] Neusoft Res Intelligent Healthcare Technol Co Ltd, Shenyang, Peoples R China
[3] Neusoft Corp, Shenyang, Peoples R China
来源
THIRTY-EIGHTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 38 NO 12 | 2024年
基金
瑞士国家科学基金会;
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In the machine learning field, the challenge of effectively learning with limited data has become increasingly crucial. Active Learning (AL) algorithms play a significant role in this by enhancing model performance. We introduce a novel AL algorithm, termed Co-learning (CoLAL), designed to select the most diverse and representative samples within a training dataset. This approach utilizes noisy labels and predictions made by the primary model on unlabeled data. By leveraging a probabilistic graphical model, we combine two multi-class classifiers into a binary one. This classifier determines if both the main and the peer models agree on a prediction. If they do, the unlabeled sample is assumed to be easy to classify and is thus not beneficial to increase the target model's performance. We prioritize data that represents the unlabeled set without overlapping decision boundaries. The discrepancies between these boundaries can be estimated by the probability that two models result in the same prediction. Through theoretical analysis and experimental validation, we reveal that the integration of noisy labels into the peer model effectively identifies target model's potential inaccuracies. We evaluated the CoLAL method across seven benchmark datasets: four text datasets (AGNews, DBPedia, PubMed, SST-2) and text-based state-of-the-art (SOTA) baselines, and three image datasets (CIFAR100, MNIST, OpenML-155) and computer vision SOTA baselines. The results show that our CoLAL method significantly outperforms existing SOTA in text-based AL, and is competitive with SOTA image-based AL techniques.
引用
收藏
页码:13337 / 13345
页数:9
相关论文
共 50 条
  • [31] HYPERSPECTRAL IMAGE CLASSIFICATION BASED ON CO-LEARNING THROUGH DUAL-ARCHITECTURE ENSEMBLE
    Chen Xiaoyue
    Cao Xianghai
    2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 2804 - 2808
  • [32] Children and sexting: The case for intergenerational co-learning
    Lee, Nick
    Hewett, Angela
    Jorgensen, Clara Rubner
    Turner, Jerome
    Wade, Alex
    Weckesser, Annalise
    CHILDHOOD-A GLOBAL JOURNAL OF CHILD RESEARCH, 2018, 25 (03): : 385 - 399
  • [33] Co-learning: Learning from Noisy Labels with Self-supervision
    Tan, Cheng
    Xia, Jun
    Wu, Lirong
    Li, Stan Z.
    PROCEEDINGS OF THE 29TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2021, 2021, : 1405 - 1413
  • [34] Co-learning Binary Classifiers for LP-Based Multi-label Classification
    Shan, Jincheng
    Hou, Chenping
    Zhuge, Wenzhang
    Yi, Dongyun
    INTELLIGENCE SCIENCE AND BIG DATA ENGINEERING, 2018, 11266 : 443 - 453
  • [35] Applying active learning to assertion classification of concepts in clinical text
    Chen, Yukun
    Mani, Subramani
    Xu, Hua
    JOURNAL OF BIOMEDICAL INFORMATICS, 2012, 45 (02) : 265 - 272
  • [36] Spectral Clustering based Active Learning with Applications to Text Classification
    Guo, Wenbo
    Zhong, Chun
    Yang, Yupu
    2016 8TH INTERNATIONAL CONFERENCE ON COMPUTER AND AUTOMATION ENGINEERING (ICCAE 2016), 2016, 56
  • [37] Learning Together: Co-Learning Among Faculty and Trainees in the Clinical Workplace
    Haddock, Lindsey
    Rivera, Josette
    O'Brien, Bridget C.
    ACADEMIC MEDICINE, 2023, 98 (02) : 228 - 236
  • [38] Co-learning binary classifiers for LP-based multi-label classification
    Shan, Jincheng
    Hou, Chenping
    Tao, Hong
    Zhuge, Wenzhang
    Yi, Dongyun
    COGNITIVE SYSTEMS RESEARCH, 2019, 55 : 146 - 152
  • [39] Support vector machine active learning with applications to text classification
    Tong, S
    Koller, D
    JOURNAL OF MACHINE LEARNING RESEARCH, 2002, 2 (01) : 45 - 66
  • [40] Impact of Batch Size on Stopping Active Learning for Text Classification
    Beatty, Garrett
    Kochis, Ethan
    Bloodgood, Michael
    2018 IEEE 12TH INTERNATIONAL CONFERENCE ON SEMANTIC COMPUTING (ICSC), 2018, : 306 - 307