Supervised Contrastive Learning for 3D Cross-Modal Retrieval

被引:0
作者
Choo, Yeon-Seung [1 ]
Kim, Boeun [2 ]
Kim, Hyun-Sik [1 ]
Park, Yong-Suk [1 ]
机构
[1] Korea Elect Technol Inst KETI, Contents Convergence Res Ctr, Seoul 03924, South Korea
[2] Korea Elect Technol Inst KETI, Artificial Intelligence Res Ctr, Seongnam 13509, South Korea
来源
APPLIED SCIENCES-BASEL | 2024年 / 14卷 / 22期
关键词
cross-modal; object retrieval; contrastive learning;
D O I
10.3390/app142210322
中图分类号
O6 [化学];
学科分类号
0703 ;
摘要
Interoperability between different virtual platforms requires the ability to search and transfer digital assets across platforms. Digital assets in virtual platforms are represented in different forms or modalities, such as images, meshes, and point clouds. The cross-modal retrieval of three-dimensional (3D) object representations is challenging due to data representation diversity, making common feature space discovery difficult. Recent studies have been focused on obtaining feature consistency within the same classes and modalities using cross-modal center loss. However, center features are sensitive to hyperparameter variations, making cross-modal center loss susceptible to performance degradation. This paper proposes a new 3D cross-modal retrieval method that uses cross-modal supervised contrastive learning (CSupCon) and the fixed projection head (FPH) strategy. Contrastive learning mitigates the influence of hyperparameters by maximizing feature distinctiveness. The FPH strategy prevents gradient updates in the projection network, enabling the focused training of the backbone networks. The proposed method shows a mean average precision (mAP) increase of 1.17 and 0.14 in 3D cross-modal object retrieval experiments using ModelNet10 and ModelNet40 datasets compared to state-of-the-art (SOTA) methods.
引用
收藏
页数:13
相关论文
共 24 条
  • [1] CrossPoint: Self-Supervised Cross-Modal Contrastive Learning for 3D Point Cloud Understanding
    Afham, Mohamed
    Dissanayake, Isuru
    Dissanayake, Dinithi
    Dharmasiri, Amaya
    Thilakarathna, Kanchana
    Rodrigo, Ranga
    [J]. 2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2022, : 9892 - 9902
  • [2] Chen T, 2020, PR MACH LEARN RES, V119
  • [3] Exploring Simple Siamese Representation Learning
    Chen, Xinlei
    He, Kaiming
    [J]. 2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021, 2021, : 15745 - 15753
  • [4] Semantic Pre-Alignment and Ranking Learning With Unified Framework for Cross-Modal Retrieval
    Cheng, Qingrong
    Tan, Zhenshan
    Wen, Keyu
    Chen, Cheng
    Gu, Xiaodong
    [J]. IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2024, 34 (07) : 6503 - 6516
  • [5] RONO: Robust Discriminative Learning with Noisy Labels for 2D-3D Cross-Modal Retrieval
    Feng, Yanglin
    Zhu, Hongyuan
    Peng, Dezhong
    Peng, Xi
    Hu, Peng
    [J]. 2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2023, : 11610 - 11619
  • [6] Feng YT, 2019, AAAI CONF ARTIF INTE, P8279
  • [7] Grill Jean-Bastien, 2020, Advances in Neural Information Processing Systems, V33, P21271
  • [8] Momentum Contrast for Unsupervised Visual Representation Learning
    He, Kaiming
    Fan, Haoqi
    Wu, Yuxin
    Xie, Saining
    Girshick, Ross
    [J]. 2020 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2020), 2020, : 9726 - 9735
  • [9] Deep Residual Learning for Image Recognition
    He, Kaiming
    Zhang, Xiangyu
    Ren, Shaoqing
    Sun, Jian
    [J]. 2016 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2016, : 770 - 778
  • [10] Cross-Modal Center Loss for 3D Cross-Modal Retrieval
    Jing, Longlong
    Vahdani, Elahe
    Tan, Jiaxing
    Tian, Yingli
    [J]. 2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021, 2021, : 3141 - 3150