Deep contrastive representation learning for multi-modal clustering

被引:3
作者
Lu, Yang [1 ,2 ]
Li, Qin [3 ]
Zhang, Xiangdong [1 ]
Gao, Quanxue [1 ]
机构
[1] Xidian Univ, Sch Telecommun Engn, Xian 710071, Shaanxi, Peoples R China
[2] Res Inst Air Firce, Beijing, Peoples R China
[3] Shenzhen Inst Informat Technol, Sch Software Engn, Shenzhen 518172, Peoples R China
基金
中国国家自然科学基金;
关键词
Multi-view representation learning; Self-supervision; Clustering;
D O I
10.1016/j.neucom.2024.127523
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Benefiting from the informative expression capability of contrastive representation learning (CRL), recent multi -modal learning studies have achieved promising clustering performance. However, it should be pointed out that the existing multi -modal clustering methods based on CRL fail to simultaneously take the similarity information embedded in inter- and intra-modal levels. In this study, we mainly explore deep multi -modal contrastive representation learning, and present a multi -modal learning network, named trustworthy multimodal contrastive clustering (TMCC), which incorporates contrastive learning and adaptively reliable sample selection with multi -modal clustering. Specifically, we are concerned with an adaptive filter to learn TMCC via progressing from 'easy' to 'complex' samples. Based on this, with the highly confident clustering labels, we present a new contrastive loss to learn modal -consensus representation, which takes into account not only the inter -modal similarity but also the intra-modal similarity. Experimental results show that these principles in TMCC consistently help promote clustering performance improvement.
引用
收藏
页数:8
相关论文
共 58 条
  • [51] Multi-Modal Variational Graph Auto-Encoder for Recommendation Systems
    Yi, Jing
    Chen, Zhenzhong
    [J]. IEEE TRANSACTIONS ON MULTIMEDIA, 2022, 24 : 1067 - 1079
  • [52] Dif-Fusion: Toward High Color Fidelity in Infrared and Visible Image Fusion With Diffusion Models
    Yue, Jun
    Fang, Leyuan
    Xia, Shaobo
    Deng, Yue
    Ma, Jiayi
    [J]. IEEE TRANSACTIONS ON IMAGE PROCESSING, 2023, 32 : 5705 - 5720
  • [53] Generalized Latent Multi-View Subspace Clustering
    Zhang, Changqing
    Fu, Huazhu
    Hu, Qinghua
    Cao, Xiaochun
    Xie, Yuan
    Tao, Dacheng
    Xu, Dong
    [J]. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2020, 42 (01) : 86 - 99
  • [54] Object detection with location-aware deformable convolution and backward attention filtering
    Zhang, Chen
    Kim, Joohee
    [J]. 2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019), 2019, : 9444 - 9453
  • [55] Cross-modality deep feature learning for brain tumor segmentation
    Zhang, Dingwen
    Huang, Guohai
    Zhang, Qiang
    Han, Jungong
    Han, Junwei
    Yu, Yizhou
    [J]. PATTERN RECOGNITION, 2021, 110
  • [56] Exploring Task Structure for Brain Tumor Segmentation From Multi-Modality MR Images
    Zhang, Dingwen
    Huang, Guohai
    Zhang, Qiang
    Han, Jungong
    Han, Junwei
    Wang, Yizhou
    Yu, Yizhou
    [J]. IEEE TRANSACTIONS ON IMAGE PROCESSING, 2020, 29 : 9032 - 9043
  • [57] Dropping Pathways Towards Deep Multi-View Graph Subspace Clustering Networks
    Zhang, Zihao
    Wang, Qianqian
    Tao, Zhiqiang
    Gao, Quanxue
    Feng, Wei
    [J]. PROCEEDINGS OF THE 31ST ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2023, 2023, : 3259 - 3267
  • [58] Multiview Deep Graph Infomax to Achieve Unsupervised Graph Embedding
    Zhou, Zhichao
    Hu, Yu
    Zhang, Yue
    Chen, Jiazhou
    Cai, Hongmin
    [J]. IEEE TRANSACTIONS ON CYBERNETICS, 2023, 53 (10) : 6329 - 6339