Deep contrastive representation learning for multi-modal clustering

被引:3
作者
Lu, Yang [1 ,2 ]
Li, Qin [3 ]
Zhang, Xiangdong [1 ]
Gao, Quanxue [1 ]
机构
[1] Xidian Univ, Sch Telecommun Engn, Xian 710071, Shaanxi, Peoples R China
[2] Res Inst Air Firce, Beijing, Peoples R China
[3] Shenzhen Inst Informat Technol, Sch Software Engn, Shenzhen 518172, Peoples R China
基金
中国国家自然科学基金;
关键词
Multi-view representation learning; Self-supervision; Clustering;
D O I
10.1016/j.neucom.2024.127523
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Benefiting from the informative expression capability of contrastive representation learning (CRL), recent multi -modal learning studies have achieved promising clustering performance. However, it should be pointed out that the existing multi -modal clustering methods based on CRL fail to simultaneously take the similarity information embedded in inter- and intra-modal levels. In this study, we mainly explore deep multi -modal contrastive representation learning, and present a multi -modal learning network, named trustworthy multimodal contrastive clustering (TMCC), which incorporates contrastive learning and adaptively reliable sample selection with multi -modal clustering. Specifically, we are concerned with an adaptive filter to learn TMCC via progressing from 'easy' to 'complex' samples. Based on this, with the highly confident clustering labels, we present a new contrastive loss to learn modal -consensus representation, which takes into account not only the inter -modal similarity but also the intra-modal similarity. Experimental results show that these principles in TMCC consistently help promote clustering performance improvement.
引用
收藏
页数:8
相关论文
共 58 条
  • [1] Andrew G., 2013, P 30 INT C MACHINE L, P1247
  • [2] [Anonymous], 2016, CoRR abs/1602.01024
  • [3] Bengio Y., 2009, P 26 ANN INT C MACH, P41, DOI DOI 10.1145/1553374.1553380
  • [4] Chang JL, 2017, IEEE I CONF COMP VIS, P5880, DOI [10.1109/ICCV.2017.627, 10.1109/ICCV.2017.626]
  • [5] Chen T, 2020, PR MACH LEARN RES, V119
  • [6] Cheng JF, 2020, PROCEEDINGS OF THE TWENTY-NINTH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, P2973
  • [7] Chua T.S., 2009, P ACM INT C IMAGE VI, P48
  • [8] Adaptive Graph Encoder for Attributed Graph Embedding
    Cui, Ganqu
    Zhou, Jie
    Yang, Cheng
    Liu, Zhiyuan
    [J]. KDD '20: PROCEEDINGS OF THE 26TH ACM SIGKDD INTERNATIONAL CONFERENCE ON KNOWLEDGE DISCOVERY & DATA MINING, 2020, : 976 - 985
  • [9] ROSITA: Enhancing Vision-and-Language Semantic Alignments via Cross- and Intra-modal Knowledge Integration
    Cui, Yuhao
    Yu, Zhou
    Wang, Chunqi
    Zhao, Zhongzhou
    Zhang, Ji
    Wang, Meng
    Yu, Jun
    [J]. PROCEEDINGS OF THE 29TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2021, 2021, : 797 - 806
  • [10] Deng C, 2015, PROCEEDINGS OF THE TWENTY-FOURTH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE (IJCAI), P3438