Multi-Range View Aggregation Network With Vision Transformer Feature Fusion for 3D Object Retrieval

被引：7

作者：

Lin, Dongyun ^{[1
]}

Li, Yiqun ^{[1
]}

Cheng, Yi ^{[1
]}

Prasad, Shitala ^{[1
]}

Guo, Aiyuan ^{[1
]}

Cao, Yanpeng ^{[2
]}

机构：

[1] ASTAR, Inst Infocomm Res I2R, Singapore 138632, Singapore

[2] Zhejiang Univ, State Key Lab Fluid Power & Mechatron Syst, Hangzhou 310027, Peoples R China

来源：

IEEE TRANSACTIONS ON MULTIMEDIA | 2023年 / 25卷

关键词：

Three-dimensional displays; Feature extraction; Transformers; Convolutional neural networks; Visualization; Fuses; Deep learning; 3D object retrieval; multi-range view aggregation; multi-head self-attention; feature fusion; SIMILARITY; DIFFUSION;

D O I：

10.1109/TMM.2023.3246229

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

View-based methods have achieved state-of-the-art performance in 3D object retrieval. However, view-based methods still encounter two major challenges. The first is how to leverage the inter-view correlation to enhance view-level visual features. The second is how to effectively fuse view-level features into a discriminative global descriptor. Towards these two challenges, we propose a multi-range view aggregation network (MRVA-Net) with a vision transformer based feature fusion scheme for 3D object retrieval. Unlike the existing methods which only consider aggregating neighboring or adjacent views which could bring in redundant information, we propose a multi-range view aggregation module to enhance individual view representations through view aggregation beyond only neighboring views but also incorporate the views at different ranges. Furthermore, to generate the global descriptor from view-level features, we propose to employ the multi-head self-attention mechanism introduced by vision transformer to fuse the view-level features. Extensive experiments conducted on three public datasets including ModelNet40, ShapeNet Core55 and MCB-A demonstrate the superiority of the proposed network over the state-of-the-art methods in 3D object retrieval.

引用

页码：9108 / 9119

页数：12

共 50 条

[1] Multi-View Hierarchical Fusion Network for 3D Object Retrieval and Classification
Liu, An-An
Hu, Nian
Song, Dan
Guo, Fu-Bin
Zhou, He-Yu
Hao, Tong
IEEE ACCESS, 2019, 7 : 153021 - 153030
[2] Multi-view convolutional vision transformer for 3D object recognition
Li, Jie
Liu, Zhao
Li, Li
Lin, Junqin
Yao, Jian
Tu, Jingmin
JOURNAL OF VISUAL COMMUNICATION AND IMAGE REPRESENTATION, 2023, 95
[3] An Improved Multi-View Convolutional Neural Network for 3D Object Retrieval
He, Xinwei
Bai, Song
Chu, Jiajia
Bai, Xiang
IEEE TRANSACTIONS ON IMAGE PROCESSING, 2020, 29 : 7917 - 7930
[4] MFFTNet: A Novel 3D Point Cloud Segmentation Network Based on Multi-Scale Feature Fusion and Transformer Architecture
Bai, Hao
Li, Xiongwei
Meng, Qing
Zhuo, Shulong
Yan, Lili
IEEE ACCESS, 2025, 13 : 9462 - 9472
[5] Multi-View 3D Object Retrieval With Deep Embedding Network
Guo, Haiyun
Wang, Jinqiao
Gao, Yue
Li, Jianqiang
Lu, Hanqing
IEEE TRANSACTIONS ON IMAGE PROCESSING, 2016, 25 (12) : 5526 - 5537
[6] VGNet: Multimodal Feature Extraction and Fusion Network for 3D CAD Model Retrieval
Qin, Feiwei
Zhan, Gaoyang
Fang, Meie
Chen, C. L. Philip
Li, Ping
IEEE TRANSACTIONS ON MULTIMEDIA, 2025, 27 : 1432 - 1447
[7] Multi-Feature Fusion Based on Multi-View Feature and 3D Shape Feature for Non-Rigid 3D Model Retrieval
Zeng, Hui
Wang, Qi
Liu, Jiwei
IEEE ACCESS, 2019, 7 : 41584 - 41595
[8] Multi-Source Features Fusion Single Stage 3D Object Detection With Transformer
Tong, Guofeng
Li, Zheng
Peng, Hao
Wang, Yaqi
IEEE ROBOTICS AND AUTOMATION LETTERS, 2023, 8 (04) : 2062 - 2069
[9] Multimodal Feature Fusion for 3D Shape Recognition and Retrieval
Bu, Shuhui
Cheng, Shaoguang
Liu, Zhenbao
Han, Junwei
IEEE MULTIMEDIA, 2014, 21 (04) : 38 - 46
[10] MVPointNet: Multi-View Network for 3D Object Based on Point Cloud
Zhou, Weiguo
Jiang, Xin
Liu, Yun-Hui
IEEE SENSORS JOURNAL, 2019, 19 (24) : 12145 - 12152

← 1 2 3 4 5 →