Cformer: Semi-Supervised Text Clustering Based on Pseudo Labeling

被引:3
|
作者
Hatefi, Arezoo [1 ]
Vu, Xuan-Son [1 ]
Bhuyan, Monowar [1 ]
Drewes, Frank [1 ]
机构
[1] Umea Univ, Dept Comp Sci, Umea, Sweden
来源
PROCEEDINGS OF THE 30TH ACM INTERNATIONAL CONFERENCE ON INFORMATION & KNOWLEDGE MANAGEMENT, CIKM 2021 | 2021年
关键词
meta pseudo clustering; semi-supervised learning; pseudo labeling;
D O I
10.1145/3459637.3482073
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
We propose a semi-supervised learning method called Cformer for automatic clustering of text documents in cases where clusters are described by a small number of labeled examples, while the majority of training examples are unlabeled. We motivate this setting with an application in contextual programmatic advertising, a type of content placement on news pages that does not exploit personal information about visitors but relies on the availability of a high-quality clustering computed on the basis of a small number of labeled samples. To enable text clustering with little training data, Cformer leverages the teacher-student architecture of Meta Pseudo Labels. In addition to unlabeled data, Cformer uses a small amount of labeled data to describe the clusters aimed at. Our experimental results confirm that the performance of the proposed model improves the state-of-the-art if a reasonable amount of labeled data is available. The models are comparatively small and suitable for deployment in constrained environments with limited computing resources. The source code is available at https://github.com/Aha6988/Cformer.
引用
收藏
页码:3078 / 3082
页数:5
相关论文
共 50 条
  • [41] An Exploration of Semi-supervised Text Classification
    Lien, Henrik
    Biermann, Daniel
    Palumbo, Fabrizio
    Goodwin, Morten
    ENGINEERING APPLICATIONS OF NEURAL NETWORKS, EAAAI/EANN 2022, 2022, 1600 : 477 - 488
  • [42] Joint Global and Dynamic Pseudo Labeling for Semi-Supervised Point Cloud Sequence Segmentation
    Liu, Jinxian
    Chen, Ye
    Ni, Bingbing
    Yu, Zhenbo
    IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2023, 33 (10) : 5679 - 5691
  • [43] Research Progress on Semi-Supervised Clustering
    Yue Qin
    Shifei Ding
    Lijuan Wang
    Yanru Wang
    Cognitive Computation, 2019, 11 : 599 - 612
  • [44] TENET: Beyond Pseudo-labeling for Semi-supervised Few-shot Learning
    Ma, Chengcheng
    Dong, Weiming
    Xu, Changsheng
    MACHINE INTELLIGENCE RESEARCH, 2025, : 511 - 523
  • [45] Toward Effective Semi-supervised Node Classification with Hybrid Curriculum Pseudo-labeling
    Luo, Xiao
    Ju, Wei
    Gu, Yiyang
    Qin, Yifang
    Yi, Siyu
    Wu, Daqing
    Liu, Luchen
    Zhang, Ming
    ACM TRANSACTIONS ON MULTIMEDIA COMPUTING COMMUNICATIONS AND APPLICATIONS, 2024, 20 (03)
  • [46] Semi-supervised clustering with soft labels
    Nebu, Cynthia Marea
    Joseph, Sumy
    2015 INTERNATIONAL CONFERENCE ON CONTROL COMMUNICATION & COMPUTING INDIA (ICCC), 2015, : 612 - 616
  • [47] Semi-Supervised Training with Pseudo-Labeling for End-to-End Neural Diarization
    Takashima, Yuki
    Fujita, Yusuke
    Horiguchi, Shota
    Watanabe, Shinji
    Garcia, Paola
    Nagamatsu, Kenji
    INTERSPEECH 2021, 2021, : 3096 - 3100
  • [48] Semi-Supervised Few-Shot Object Detection via Adaptive Pseudo Labeling
    Tang, Yingbo
    Cao, Zhiqiang
    Yang, Yuequan
    Liu, Jierui
    Yu, Junzhi
    IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2024, 34 (04) : 2151 - 2165
  • [49] Evolutionary semi-supervised fuzzy clustering
    Liu, H
    Huang, ST
    PATTERN RECOGNITION LETTERS, 2003, 24 (16) : 3105 - 3113
  • [50] Semi-supervised Power Iteration Clustering
    Yang, Yuqi
    Bie, Rongfang
    Wu, Hao
    Xu, Shuaijing
    Li, Liangchi
    2018 INTERNATIONAL CONFERENCE ON IDENTIFICATION, INFORMATION AND KNOWLEDGE IN THE INTERNET OF THINGS, 2019, 147 : 588 - 595