Deep contrastive representation learning for multi-modal clustering

被引：3

作者：

Lu, Yang ^{[1
,2
]}

Li, Qin ^{[3
]}

Zhang, Xiangdong ^{[1
]}

Gao, Quanxue ^{[1
]}

机构：

[1] Xidian Univ, Sch Telecommun Engn, Xian 710071, Shaanxi, Peoples R China

[2] Res Inst Air Firce, Beijing, Peoples R China

[3] Shenzhen Inst Informat Technol, Sch Software Engn, Shenzhen 518172, Peoples R China

来源：

NEUROCOMPUTING | 2024年 / 581卷

基金：

中国国家自然科学基金;

关键词：

Multi-view representation learning; Self-supervision; Clustering;

D O I：

10.1016/j.neucom.2024.127523

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Benefiting from the informative expression capability of contrastive representation learning (CRL), recent multi -modal learning studies have achieved promising clustering performance. However, it should be pointed out that the existing multi -modal clustering methods based on CRL fail to simultaneously take the similarity information embedded in inter- and intra-modal levels. In this study, we mainly explore deep multi -modal contrastive representation learning, and present a multi -modal learning network, named trustworthy multimodal contrastive clustering (TMCC), which incorporates contrastive learning and adaptively reliable sample selection with multi -modal clustering. Specifically, we are concerned with an adaptive filter to learn TMCC via progressing from 'easy' to 'complex' samples. Based on this, with the highly confident clustering labels, we present a new contrastive loss to learn modal -consensus representation, which takes into account not only the inter -modal similarity but also the intra-modal similarity. Experimental results show that these principles in TMCC consistently help promote clustering performance improvement.

引用

页数：8

共 58 条

[51] Multi-Modal Variational Graph Auto-Encoder for Recommendation Systems
Yi, Jing
Chen, Zhenzhong
[J]. IEEE TRANSACTIONS ON MULTIMEDIA, 2022, 24 : 1067 - 1079
[52] Dif-Fusion: Toward High Color Fidelity in Infrared and Visible Image Fusion With Diffusion Models
Yue, Jun
Fang, Leyuan
Xia, Shaobo
Deng, Yue
Ma, Jiayi
[J]. IEEE TRANSACTIONS ON IMAGE PROCESSING, 2023, 32 : 5705 - 5720
[53] Generalized Latent Multi-View Subspace Clustering
Zhang, Changqing
Fu, Huazhu
Hu, Qinghua
Cao, Xiaochun
Xie, Yuan
Tao, Dacheng
Xu, Dong
[J]. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2020, 42 (01) : 86 - 99
[54] Object detection with location-aware deformable convolution and backward attention filtering
Zhang, Chen
Kim, Joohee
[J]. 2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019), 2019, : 9444 - 9453
[55] Cross-modality deep feature learning for brain tumor segmentation
Zhang, Dingwen
Huang, Guohai
Zhang, Qiang
Han, Jungong
Han, Junwei
Yu, Yizhou
[J]. PATTERN RECOGNITION, 2021, 110
[56] Exploring Task Structure for Brain Tumor Segmentation From Multi-Modality MR Images
Zhang, Dingwen
Huang, Guohai
Zhang, Qiang
Han, Jungong
Han, Junwei
Wang, Yizhou
Yu, Yizhou
[J]. IEEE TRANSACTIONS ON IMAGE PROCESSING, 2020, 29 : 9032 - 9043
[57] Dropping Pathways Towards Deep Multi-View Graph Subspace Clustering Networks
Zhang, Zihao
Wang, Qianqian
Tao, Zhiqiang
Gao, Quanxue
Feng, Wei
[J]. PROCEEDINGS OF THE 31ST ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2023, 2023, : 3259 - 3267
[58] Multiview Deep Graph Infomax to Achieve Unsupervised Graph Embedding
Zhou, Zhichao
Hu, Yu
Zhang, Yue
Chen, Jiazhou
Cai, Hongmin
[J]. IEEE TRANSACTIONS ON CYBERNETICS, 2023, 53 (10) : 6329 - 6339

← 1 2 3 4 5 6 →