Cross-Modal Center Loss for 3D Cross-Modal Retrieval

被引:53
作者
Jing, Longlong [1 ]
Vahdani, Elahe [1 ]
Tan, Jiaxing [1 ]
Tian, Yingli [1 ]
机构
[1] CUNY, New York, NY 10021 USA
来源
2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021 | 2021年
关键词
REPRESENTATION; ENSEMBLE;
D O I
10.1109/CVPR46437.2021.00316
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Cross-modal retrieval aims to learn discriminative and modal-invariant features for data from different modalities. Unlike the existing methods which usually learn from the features extracted by offline networks, in this paper, we propose an approach to jointly train the components of cross-modal retrieval framework with metadata, and enable the network to find optimal features. The proposed end-to-end framework is updated with three loss functions: 1) a novel cross-modal center loss to eliminate cross-modal discrepancy, 2) cross-entropy loss to maximize inter-class variations, and 3) mean-square-error loss to reduce modality variations. In particular, our proposed cross-modal center loss minimizes the distances of features from objects belonging to the same class across all modalities. Extensive experiments have been conducted on the retrieval tasks across multi-modalities including 2D image, 3D point cloud and mesh data. The proposed framework significantly outperforms the state-of-the-art methods for both cross-modal and in-domain retrieval for 3D objects on the ModelNet10 and ModelNet40 datasets.
引用
收藏
页码:3141 / 3150
页数:10
相关论文
共 53 条
[1]  
Alzu'bi Amal Adel, 2019, Perspect Health Inf Manag, V16, p1a
[2]  
Andrew G., 2013, PMLR, V28, P1247
[3]  
[Anonymous], 2015, PROC CVPR IEEE, DOI DOI 10.1109/CVPR.2015.7298801
[4]  
[Anonymous], 2003, PROC 1 INSTRUCTIONAL
[5]   Ensemble Diffusion for RetrievalEnsemble Diffusion for RetrievalEnsemble Diffusion for Retrieval [J].
Bai, Song ;
Zhou, Zhichao ;
Wang, Jingdong ;
Bai, Xiang ;
Latecki, Longin Jan ;
Tian, Qi .
2017 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2017, :774-783
[6]   GIFT: A Real-time and Scalable 3D Shape Search Engine [J].
Bai, Song ;
Bai, Xiang ;
Zhou, Zhichao ;
Zhang, Zhaoxiang ;
Latecki, Longin Jan .
2016 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2016, :5023-5032
[7]  
Chang A X, 2015, COMPUTER SCI, V1512, P3
[8]   On visual similarity based 3D model retrieval [J].
Chen, DY ;
Tian, XP ;
Shen, YT ;
Ming, OY .
COMPUTER GRAPHICS FORUM, 2003, 22 (03) :223-232
[9]   Shape Completion using 3D-Encoder-Predictor CNNs and Shape Synthesis [J].
Dai, Angela ;
Qi, Charles Ruizhongtai ;
Niessner, Matthias .
30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, :6545-6554
[10]   Cross-modal Retrieval with Correspondence Autoencoder [J].
Feng, Fangxiang ;
Wang, Xiaojie ;
Li, Ruifan .
PROCEEDINGS OF THE 2014 ACM CONFERENCE ON MULTIMEDIA (MM'14), 2014, :7-16