Supervised Contrastive Learning for 3D Cross-Modal Retrieval

被引：0

作者：

Choo, Yeon-Seung ^{[1
]}

Kim, Boeun ^{[2
]}

Kim, Hyun-Sik ^{[1
]}

Park, Yong-Suk ^{[1
]}

机构：

[1] Korea Elect Technol Inst KETI, Contents Convergence Res Ctr, Seoul 03924, South Korea

[2] Korea Elect Technol Inst KETI, Artificial Intelligence Res Ctr, Seongnam 13509, South Korea

来源：

APPLIED SCIENCES-BASEL | 2024年 / 14卷 / 22期

关键词：

cross-modal; object retrieval; contrastive learning;

D O I：

10.3390/app142210322

中图分类号：

O6 [化学];

学科分类号：

0703 ;

摘要：

Interoperability between different virtual platforms requires the ability to search and transfer digital assets across platforms. Digital assets in virtual platforms are represented in different forms or modalities, such as images, meshes, and point clouds. The cross-modal retrieval of three-dimensional (3D) object representations is challenging due to data representation diversity, making common feature space discovery difficult. Recent studies have been focused on obtaining feature consistency within the same classes and modalities using cross-modal center loss. However, center features are sensitive to hyperparameter variations, making cross-modal center loss susceptible to performance degradation. This paper proposes a new 3D cross-modal retrieval method that uses cross-modal supervised contrastive learning (CSupCon) and the fixed projection head (FPH) strategy. Contrastive learning mitigates the influence of hyperparameters by maximizing feature distinctiveness. The FPH strategy prevents gradient updates in the projection network, enabling the focused training of the backbone networks. The proposed method shows a mean average precision (mAP) increase of 1.17 and 0.14 in 3D cross-modal object retrieval experiments using ModelNet10 and ModelNet40 datasets compared to state-of-the-art (SOTA) methods.

引用

页数：13

共 24 条

[21] An Orthogonal Subspace Decomposition Method for Cross-Modal Retrieval [J].

Zeng, Zhixiong ;

Xu, Nan ;

Mao, Wenji ;

Zeng, Daniel .

IEEE INTELLIGENT SYSTEMS, 2022, 37 (03) :45-53

[22] HCMSL: Hybrid Cross-modal Similarity Learning for Cross-modal Retrieval [J].

Zhang, Chengyuan ;

Song, Jiayu ;

Zhu, Xiaofeng ;

Zhu, Lei ;

Zhang, Shichao .

ACM TRANSACTIONS ON MULTIMEDIA COMPUTING COMMUNICATIONS AND APPLICATIONS, 2021, 17 (01)

[23] Deep Supervised Cross-modal Retrieval [J].

Zhen, Liangli ;

Hu, Peng ;

Wang, Xu ;

Peng, Dezhong .

2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019), 2019, :10386-10395

[24] Multiview latent space learning with progressively fine-tuned deep features for unsupervised domain adaptation [J].

Zhu, Chenyang ;

Wang, Qian ;

Xie, Yunxin ;

Xu, Shoukun .

INFORMATION SCIENCES, 2024, 662

← 1 2 3 →