Cformer: Semi-Supervised Text Clustering Based on Pseudo Labeling

被引:3
|
作者
Hatefi, Arezoo [1 ]
Vu, Xuan-Son [1 ]
Bhuyan, Monowar [1 ]
Drewes, Frank [1 ]
机构
[1] Umea Univ, Dept Comp Sci, Umea, Sweden
来源
PROCEEDINGS OF THE 30TH ACM INTERNATIONAL CONFERENCE ON INFORMATION & KNOWLEDGE MANAGEMENT, CIKM 2021 | 2021年
关键词
meta pseudo clustering; semi-supervised learning; pseudo labeling;
D O I
10.1145/3459637.3482073
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
We propose a semi-supervised learning method called Cformer for automatic clustering of text documents in cases where clusters are described by a small number of labeled examples, while the majority of training examples are unlabeled. We motivate this setting with an application in contextual programmatic advertising, a type of content placement on news pages that does not exploit personal information about visitors but relies on the availability of a high-quality clustering computed on the basis of a small number of labeled samples. To enable text clustering with little training data, Cformer leverages the teacher-student architecture of Meta Pseudo Labels. In addition to unlabeled data, Cformer uses a small amount of labeled data to describe the clusters aimed at. Our experimental results confirm that the performance of the proposed model improves the state-of-the-art if a reasonable amount of labeled data is available. The models are comparatively small and suitable for deployment in constrained environments with limited computing resources. The source code is available at https://github.com/Aha6988/Cformer.
引用
收藏
页码:3078 / 3082
页数:5
相关论文
共 50 条
  • [31] Fuzzy semi-supervised co-clustering for text documents
    Yan, Yang
    Chen, Lihui
    Tjhi, William-Chandra
    FUZZY SETS AND SYSTEMS, 2013, 215 : 74 - 89
  • [32] Semi-supervised Classification Based on Clustering Ensembles
    Chen, Si
    Guo, Gongde
    Chen, Lifei
    ARTIFICIAL INTELLIGENCE AND COMPUTATIONAL INTELLIGENCE, PROCEEDINGS, 2009, 5855 : 629 - 638
  • [33] Semi-supervised Surgical Tool Detection Based on Highly Confident Pseudo Labeling and Strong Augmentation Driven Consistency
    Jiang, Wenjing
    Xia, Tong
    Wang, Zhiqiong
    Jia, Fucang
    DEEP GENERATIVE MODELS, AND DATA AUGMENTATION, LABELLING, AND IMPERFECTIONS, 2021, 13003 : 154 - 162
  • [34] Pseudo-Labeling Based Practical Semi-Supervised Meta-Training for Few-Shot Learning
    Dong, Xingping
    Ouyang, Tianran
    Liao, Shengcai
    Du, Bo
    Shao, Ling
    IEEE TRANSACTIONS ON IMAGE PROCESSING, 2024, 33 : 5663 - 5675
  • [35] Dual Pseudo Supervision for Semi-Supervised Text Classification with a Reliable Teacher
    Li, Shujie
    Yang, Min
    Li, Chengming
    Xu, Ruifeng
    PROCEEDINGS OF THE 45TH INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL (SIGIR '22), 2022, : 2513 - 2518
  • [36] SEMI-SUPERVISED SPECTRAL CLUSTERING
    Mai, Xiaoyi
    Couillet, Romain
    2018 CONFERENCE RECORD OF 52ND ASILOMAR CONFERENCE ON SIGNALS, SYSTEMS, AND COMPUTERS, 2018, : 2012 - 2016
  • [37] Heterogeneous Network Based Semi-supervised Learning for Scene Text Recognition
    Jiang, Qianyi
    Song, Qi
    Li, Nan
    Zhang, Rui
    Wei, Xiaolin
    DOCUMENT ANALYSIS AND RECOGNITION, ICDAR 2021, PT IV, 2021, 12824 : 64 - 78
  • [38] Semi-supervised consensus clustering based on closed patterns
    Yang, Tianshu
    Pasquier, Nicolas
    Precioso, Frederic
    KNOWLEDGE-BASED SYSTEMS, 2022, 235
  • [39] Fast Semi-supervised Classification Based on Bisecting Clustering
    Liu, Xiaolan
    Hao, Zhifeng
    Liu, Jingao
    Lin, Zhiyong
    2ND IEEE INTERNATIONAL CONFERENCE ON ADVANCED COMPUTER CONTROL (ICACC 2010), VOL. 4, 2010, : 207 - 211
  • [40] P-PseudoLabel: Enhanced Pseudo-Labeling Framework With Network Pruning in Semi-Supervised Learning
    Ham, Gyeongdo
    Cho, Yucheol
    Lee, Jae-Hyeok
    Kim, Daeshik
    IEEE ACCESS, 2022, 10 : 115652 - 115662