Deep contrastive representation learning for multi-modal clustering

被引：3

作者：

Lu, Yang ^{[1
,2
]}

Li, Qin ^{[3
]}

Zhang, Xiangdong ^{[1
]}

Gao, Quanxue ^{[1
]}

机构：

[1] Xidian Univ, Sch Telecommun Engn, Xian 710071, Shaanxi, Peoples R China

[2] Res Inst Air Firce, Beijing, Peoples R China

[3] Shenzhen Inst Informat Technol, Sch Software Engn, Shenzhen 518172, Peoples R China

来源：

NEUROCOMPUTING | 2024年 / 581卷

基金：

中国国家自然科学基金;

关键词：

Multi-view representation learning; Self-supervision; Clustering;

D O I：

10.1016/j.neucom.2024.127523

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Benefiting from the informative expression capability of contrastive representation learning (CRL), recent multi -modal learning studies have achieved promising clustering performance. However, it should be pointed out that the existing multi -modal clustering methods based on CRL fail to simultaneously take the similarity information embedded in inter- and intra-modal levels. In this study, we mainly explore deep multi -modal contrastive representation learning, and present a multi -modal learning network, named trustworthy multimodal contrastive clustering (TMCC), which incorporates contrastive learning and adaptively reliable sample selection with multi -modal clustering. Specifically, we are concerned with an adaptive filter to learn TMCC via progressing from 'easy' to 'complex' samples. Based on this, with the highly confident clustering labels, we present a new contrastive loss to learn modal -consensus representation, which takes into account not only the inter -modal similarity but also the intra-modal similarity. Experimental results show that these principles in TMCC consistently help promote clustering performance improvement.

引用

页数：8

共 58 条

[1] Andrew G., 2013, P 30 INT C MACHINE L, P1247
[2] [Anonymous], 2016, CoRR abs/1602.01024
[3] Bengio Y., 2009, P 26 ANN INT C MACH, P41, DOI DOI 10.1145/1553374.1553380
[4] Chang JL, 2017, IEEE I CONF COMP VIS, P5880, DOI [10.1109/ICCV.2017.627, 10.1109/ICCV.2017.626]
[5] Chen T, 2020, PR MACH LEARN RES, V119
[6] Cheng JF, 2020, PROCEEDINGS OF THE TWENTY-NINTH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, P2973
[7] Chua T.S., 2009, P ACM INT C IMAGE VI, P48
[8] Adaptive Graph Encoder for Attributed Graph Embedding
Cui, Ganqu
Zhou, Jie
Yang, Cheng
Liu, Zhiyuan
[J]. KDD '20: PROCEEDINGS OF THE 26TH ACM SIGKDD INTERNATIONAL CONFERENCE ON KNOWLEDGE DISCOVERY & DATA MINING, 2020, : 976 - 985
[9] ROSITA: Enhancing Vision-and-Language Semantic Alignments via Cross- and Intra-modal Knowledge Integration
Cui, Yuhao
Yu, Zhou
Wang, Chunqi
Zhao, Zhongzhou
Zhang, Ji
Wang, Meng
Yu, Jun
[J]. PROCEEDINGS OF THE 29TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2021, 2021, : 797 - 806
[10] Deng C, 2015, PROCEEDINGS OF THE TWENTY-FOURTH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE (IJCAI), P3438

← 1 2 3 4 5 6 →