SCAE: Structural Contrastive Auto-Encoder for Incomplete Multi-View Representation Learning

被引：0

作者：

Li, Mengran ^{[1
]}

Zhang, Ronghui ^{[1
]}

Zhang, Yong ^{[2
]}

Piao, Xinglin ^{[2
]}

Zhao, Shiyu ^{[2
]}

Yin, Baocai ^{[2
]}

机构：

[1] Sun Yat Sen Univ, Sch Intelligent Syst Engn, Guangdong Prov Key Lab Intelligent Transport Syst, Guangzhou 510006, Peoples R China

[2] Beijing Univ Technol, Beijing Inst Artificial Intelligence, Dept Informat Sci, Beijing Key Lab Multimedia & Intelligent Software, Beijing, Peoples R China

来源：

ACM TRANSACTIONS ON MULTIMEDIA COMPUTING COMMUNICATIONS AND APPLICATIONS | 2024年 / 20卷 / 09期

基金：

中国国家自然科学基金;

关键词：

Incomplete multi-view representation learning; MC-VAE; Dirichlet energy; mutual information maximization; contrastive learning; CLASSIFICATION;

D O I：

10.1145/3672078

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Describing an object from multiple perspectives often leads to incomplete data representation. Consequently, learning consistent representations for missing data from multiple views has emerged as a key focus in the realm of Incomplete Multi-view Representation Learning (IMRL). In recent years, various strategies, such as subspace learning, matrix decomposition, and deep learning, have been harnessed to develop numerous IMRL methods. In this article, our primary research revolves around IMRL, with a particular emphasis on addressing two main challenges. Firstly, we delve into the effective integration of intra-view similarity and contextual structure into a unified framework. Secondly, we explore the effective facilitation of information exchange and fusion across multiple views. To tackle these issues, we propose a deep learning approach known as Structural Contrastive Auto-Encoder (SCAE) to solve the challenges of IMRL. SCAE comprises two major components: intra-view structural representation learning and inter-view contrastive representation learning. The former involves capturing intra-view similarity by minimizing the Dirichlet energy of the feature matrix, while also applying spatial dispersion regularization to capture intra-view contextual structure. The latter encourages maximizing the mutual information of inter-view representations, facilitating information exchange and fusion across views. Experimental results demonstrate the efficacy of our approach in significantly enhancing model accuracy and robustly addressing IMRL problems. The code is available at https://github.com/limengran98/SCAE.

引用

页数：24

共 64 条

[1] Andrew G., 2013, PMLR, P1247
[2] [Anonymous], 2010, P NIPS
[3] Dynamic 3D Hand Gesture Recognition by Learning Weighted Depth Motion Maps
Azad, Reza
Asadi-Aghbolaghi, Maryam
Kasaei, Shohreh
Escalera, Sergio
[J]. IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2019, 29 (06) : 1729 - 1740
[4] Bardes A, 2022, Arxiv, DOI arXiv:2105.04906
[5] Chen DL, 2020, AAAI CONF ARTIF INTE, V34, P3438
[6] Chen JL, 2023, Arxiv, DOI [arXiv:2311.05767, DOI 10.48550/ARXIV.2311.05767]
[7] Inducing metallicity in graphene nanoribbons via zero-mode superlattices
Rizzo, Daniel J.
Veber, Gregory
Jiang, Jingwei
McCurdy, Ryan
Cao, Ting
Bronner, Christopher
Chen, Ting
Louie, Steven G.
Fischer, Felix R.
Crommie, Michael F.
[J]. SCIENCE, 2020, 369 (6511) : 1597 - +
[8] Learning a Deep ConvNet for Multi-label Classification with Partial Labels
Durand, Thibaut
Mehrasa, Nazanin
Mori, Greg
[J]. 2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019), 2019, : 647 - 657
[9] Fei-Fei L, 2005, PROC CVPR IEEE, P524
[10] ActionVLAD: Learning spatio-temporal aggregation for action classification
Girdhar, Rohit
Ramanan, Deva
Gupta, Abhinav
Sivic, Josef
Russell, Bryan
[J]. 30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, : 3165 - 3174

← 1 2 3 4 5 6 7 →