Multi-Range View Aggregation Network With Vision Transformer Feature Fusion for 3D Object Retrieval

被引：7

作者：

Lin, Dongyun ^{[1
]}

Li, Yiqun ^{[1
]}

Cheng, Yi ^{[1
]}

Prasad, Shitala ^{[1
]}

Guo, Aiyuan ^{[1
]}

Cao, Yanpeng ^{[2
]}

机构：

[1] ASTAR, Inst Infocomm Res I2R, Singapore 138632, Singapore

[2] Zhejiang Univ, State Key Lab Fluid Power & Mechatron Syst, Hangzhou 310027, Peoples R China

来源：

IEEE TRANSACTIONS ON MULTIMEDIA | 2023年 / 25卷

关键词：

Three-dimensional displays; Feature extraction; Transformers; Convolutional neural networks; Visualization; Fuses; Deep learning; 3D object retrieval; multi-range view aggregation; multi-head self-attention; feature fusion; SIMILARITY; DIFFUSION;

D O I：

10.1109/TMM.2023.3246229

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

View-based methods have achieved state-of-the-art performance in 3D object retrieval. However, view-based methods still encounter two major challenges. The first is how to leverage the inter-view correlation to enhance view-level visual features. The second is how to effectively fuse view-level features into a discriminative global descriptor. Towards these two challenges, we propose a multi-range view aggregation network (MRVA-Net) with a vision transformer based feature fusion scheme for 3D object retrieval. Unlike the existing methods which only consider aggregating neighboring or adjacent views which could bring in redundant information, we propose a multi-range view aggregation module to enhance individual view representations through view aggregation beyond only neighboring views but also incorporate the views at different ranges. Furthermore, to generate the global descriptor from view-level features, we propose to employ the multi-head self-attention mechanism introduced by vision transformer to fuse the view-level features. Extensive experiments conducted on three public datasets including ModelNet40, ShapeNet Core55 and MCB-A demonstrate the superiority of the proposed network over the state-of-the-art methods in 3D object retrieval.

引用

页码：9108 / 9119

页数：12

共 50 条

[21] FuseNet: a multi-modal feature fusion network for 3D shape classification
Zhao, Xin
Chen, Yinhuang
Yang, Chengzhuan
Fang, Lincong
VISUAL COMPUTER, 2025, 41 (04) : 2973 - 2985
[22] Depth-Guided Vision Transformer With Normalizing Flows for Monocular 3D Object Detection
Pan, Cong
Peng, Junran
Zhang, Zhaoxiang
IEEE-CAA JOURNAL OF AUTOMATICA SINICA, 2024, 11 (03) : 673 - 689
[23] Multi-Level View Associative Convolution Network for View-Based 3D Model Retrieval
Gao, Zan
Zhang, Yan
Zhang, Hua
Guan, Weili
Feng, Dong
Chen, Shengyong
IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2022, 32 (04) : 2264 - 2278
[24] 3D object retrieval based on multi-view convolutional neural networks
Xi-Xi Li
Qun Cao
Sha Wei
Multimedia Tools and Applications, 2017, 76 : 20111 - 20124
[25] 3D object retrieval based on multi-view convolutional neural networks
Li, Xi-Xi
Cao, Qun
Wei, Sha
MULTIMEDIA TOOLS AND APPLICATIONS, 2017, 76 (19) : 20111 - 20124
[26] Multi-Scale Keypoints Feature Fusion Network for 3D Object Detection from Point Clouds
Zhang, Xu
Bai, Linjuan
Zhang, Zuyu
Li, Yan
HUMAN-CENTRIC COMPUTING AND INFORMATION SCIENCES, 2022, 12
[27] TFIENet: Transformer Fusion Information Enhancement Network for Multimodel 3-D Object Detection
Cao, Feng
Jin, Yufeng
Tao, Chongben
Luo, Xizhao
Gao, Zhen
Zhang, Zufeng
Zheng, Sifa
Zhu, Yuan
IEEE TRANSACTIONS ON INSTRUMENTATION AND MEASUREMENT, 2024, 73
[28] Semantic and Context Information Fusion Network for View-Based 3D Model Classification and Retrieval
Liu, An-An
Guo, Fu-Bin
Zhou, He-Yu
Li, Wen-Hui
Song, Dan
IEEE ACCESS, 2020, 8 : 155939 - 155950
[29] Hierarchical Graph Attention Based Multi-View Convolutional Neural Network for 3D Object Recognition
Zeng, Hui
Zhao, Tianmeng
Cheng, Ruting
Wang, Fuzhou
Liu, Jiwei
IEEE ACCESS, 2021, 9 (09): : 33323 - 33335
[30] Hypergraph based feature fusion for 3-D object retrieval
Wang, Fanglin
Peng, Jialiang
Li, Yongjie
NEUROCOMPUTING, 2015, 151 : 612 - 619

← 1 2 3 4 5 →