Multi-view 3D object retrieval leveraging the aggregation of view and instance attentive features

被引:13
作者
Lin, Dongyun [1 ]
Li, Yiqun [1 ]
Cheng, Yi [1 ]
Prasad, Shitala [1 ]
Nwe, Tin Lay [1 ]
Dong, Sheng [1 ]
Guo, Aiyuan [1 ]
机构
[1] ASTAR, Inst Infocomm Res, 1 Fusionopolis Way,21-01 Connexis South Tower, Singapore 138632, Singapore
关键词
View-based 3D object retrieval; View attention module; Instance attention module; ArcFace loss; Cosine distance triplet -center loss; CONVOLUTIONAL NEURAL-NETWORK;
D O I
10.1016/j.knosys.2022.108754
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In multi-view 3D object retrieval tasks, it is pivotal to aggregate visual features extracted from multiple view images to generate a discriminative representation for a 3D object. The existing multi-view convolutional neural network employs view pooling for feature aggregation, which ignores the local view-relevant discriminative information within each view image and the global correlative information across all view images. To leverage both types of information, we propose two self -attention modules, namely, View Attention Module and Instance Attention Module, to learn view and instance attentive features, respectively. The final representation of a 3D object is the aggregation of three features: original, view-attentive, and instance-attentive. Furthermore, we propose employing the ArcFace loss together with the cosine-distance-based triplet-center loss as the metric learning guidance to train our model. As the cosine distance is used to rank the retrieval results, our angular metric learning losses achieve a consistent objective between the training and testing processes, thereby facilitating discriminative feature learning. Extensive experiments and ablation studies are conducted on four publicly available datasets on 3D object retrieval to show the superiority of the proposed method over multiple state-of-the-art methods. (C)& nbsp;2022 Elsevier B.V. All rights reserved.
引用
收藏
页数:12
相关论文
共 55 条
[1]  
[Anonymous], 2016, P EUR WORKSH 3D OBJ
[2]   Ensemble Diffusion for RetrievalEnsemble Diffusion for RetrievalEnsemble Diffusion for Retrieval [J].
Bai, Song ;
Zhou, Zhichao ;
Wang, Jingdong ;
Bai, Xiang ;
Latecki, Longin Jan ;
Tian, Qi .
2017 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2017, :774-783
[3]   GIFT: A Real-time and Scalable 3D Shape Search Engine [J].
Bai, Song ;
Bai, Xiang ;
Zhou, Zhichao ;
Zhang, Zhaoxiang ;
Latecki, Longin Jan .
2016 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2016, :5023-5032
[4]  
Brock A., 2016, ARXIV PREPRINT ARXIV
[5]   On visual similarity based 3D model retrieval [J].
Chen, DY ;
Tian, XP ;
Shen, YT ;
Ming, OY .
COMPUTER GRAPHICS FORUM, 2003, 22 (03) :223-232
[6]   3DCapsule: Extending the Capsule Architecture to Classify 3D Point Clouds [J].
Cheraghian, Ali ;
Petersson, Lars .
2019 IEEE WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION (WACV), 2019, :1194-1202
[7]   ArcFace: Additive Angular Margin Loss for Deep Face Recognition [J].
Deng, Jiankang ;
Guo, Jia ;
Xue, Niannan ;
Zafeiriou, Stefanos .
2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019), 2019, :4685-4694
[8]   Lightweight Face Recognition Challenge [J].
Deng, Jiankang ;
Guo, Jia ;
Zhang, Debing ;
Deng, Yafeng ;
Lu, Xiangju ;
Shi, Song .
2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION WORKSHOPS (ICCVW), 2019, :2638-2646
[9]   General-Purpose Deep Point Cloud Feature Extractor [J].
Dominguez, Miguel ;
Dhamdhere, Rohan ;
Petkar, Atir ;
Jain, Saloni ;
Sah, Shagan ;
Ptucha, Raymond .
2018 IEEE WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION (WACV 2018), 2018, :1972-1981
[10]   Iterative graph attention memory network for cross-modal retrieval [J].
Dong, Xinfeng ;
Zhang, Huaxiang ;
Dong, Xiao ;
Lu, Xu .
KNOWLEDGE-BASED SYSTEMS, 2021, 226