Cformer: Semi-Supervised Text Clustering Based on Pseudo Labeling

被引:3
|
作者
Hatefi, Arezoo [1 ]
Vu, Xuan-Son [1 ]
Bhuyan, Monowar [1 ]
Drewes, Frank [1 ]
机构
[1] Umea Univ, Dept Comp Sci, Umea, Sweden
来源
PROCEEDINGS OF THE 30TH ACM INTERNATIONAL CONFERENCE ON INFORMATION & KNOWLEDGE MANAGEMENT, CIKM 2021 | 2021年
关键词
meta pseudo clustering; semi-supervised learning; pseudo labeling;
D O I
10.1145/3459637.3482073
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
We propose a semi-supervised learning method called Cformer for automatic clustering of text documents in cases where clusters are described by a small number of labeled examples, while the majority of training examples are unlabeled. We motivate this setting with an application in contextual programmatic advertising, a type of content placement on news pages that does not exploit personal information about visitors but relies on the availability of a high-quality clustering computed on the basis of a small number of labeled samples. To enable text clustering with little training data, Cformer leverages the teacher-student architecture of Meta Pseudo Labels. In addition to unlabeled data, Cformer uses a small amount of labeled data to describe the clusters aimed at. Our experimental results confirm that the performance of the proposed model improves the state-of-the-art if a reasonable amount of labeled data is available. The models are comparatively small and suitable for deployment in constrained environments with limited computing resources. The source code is available at https://github.com/Aha6988/Cformer.
引用
收藏
页码:3078 / 3082
页数:5
相关论文
共 50 条
  • [1] Integrating pseudo labeling with contrastive clustering for transformer-based semi-supervised action recognition
    Li, Nannan
    Huang, Kan
    Wu, Qingtian
    Zhao, Yang
    APPLIED INTELLIGENCE, 2024, 54 (22) : 11177 - 11195
  • [2] Momentum Pseudo-Labeling for Semi-Supervised Speech Recognition
    Higuchi, Yosuke
    Moritz, Niko
    Le Roux, Jonathan
    Hori, Takaaki
    INTERSPEECH 2021, 2021, : 726 - 730
  • [3] Compressed video ensemble based pseudo-labeling for semi-supervised action recognition
    Terao, Hayato
    Noguchi, Wataru
    Iizuka, Hiroyuki
    Yamamoto, Masahito
    MACHINE LEARNING WITH APPLICATIONS, 2022, 9
  • [4] Pseudo-labeling Algorithm Based on Optimal Transport for Deep Semi-supervised Learning
    Zhai, De-Ming
    Shen, Si-Xian
    Zhou, Xiong
    Jiang, Jun-Jun
    Liu, Xian-Ming
    Ji, Xiang-Yang
    Ruan Jian Xue Bao/Journal of Software, 2024, 35 (11): : 5196 - 5209
  • [5] Semi-supervised Malicious Domain Detection Based on Meta Pseudo Labeling
    Gao, Yi
    Yuan, Fangfang
    Yang, Jinglin
    Wang, Dakui
    Cao, Cong
    Liu, Yanbing
    COMPUTATIONAL SCIENCE, ICCS 2024, PT II, 2024, 14833 : 312 - 324
  • [6] A Semi-Supervised Learning Method for Spiking Neural Networks Based on Pseudo-Labeling
    Nguyen, Thao N. N.
    Veeravalli, Bharadwaj
    Fong, Xuanyao
    2023 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS, IJCNN, 2023,
  • [7] GENERALIZED PSEUDO-LABELING IN CONSISTENCY REGULARIZATION FOR SEMI-SUPERVISED LEARNING
    Karaliolios, Nikolaos
    Chabot, Florian
    Dupont, Camille
    Le Borgne, Herve
    Quoc-Cuong Pham
    Audigier, Romaric
    2023 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING, ICIP, 2023, : 525 - 529
  • [8] Alternative Pseudo-Labeling for Semi-Supervised Automatic Speech Recognition
    Zhu H.
    Gao D.
    Cheng G.
    Povey D.
    Zhang P.
    Yan Y.
    IEEE/ACM Transactions on Audio Speech and Language Processing, 2023, 31 : 3320 - 3330
  • [9] Text classification with enhanced semi-supervised fuzzy clustering
    Keswani, G
    Hall, LO
    PROCEEDINGS OF THE 2002 IEEE INTERNATIONAL CONFERENCE ON FUZZY SYSTEMS, VOL 1 & 2, 2002, : 621 - 626
  • [10] Pseudo-Labeling Optimization Based Ensemble Semi-Supervised Soft Sensor in the Process Industry
    Li, Youwei
    Jin, Huaiping
    Dong, Shoulong
    Yang, Biao
    Chen, Xiangguang
    SENSORS, 2021, 21 (24)