Multi-Range View Aggregation Network With Vision Transformer Feature Fusion for 3D Object Retrieval

被引:7
|
作者
Lin, Dongyun [1 ]
Li, Yiqun [1 ]
Cheng, Yi [1 ]
Prasad, Shitala [1 ]
Guo, Aiyuan [1 ]
Cao, Yanpeng [2 ]
机构
[1] ASTAR, Inst Infocomm Res I2R, Singapore 138632, Singapore
[2] Zhejiang Univ, State Key Lab Fluid Power & Mechatron Syst, Hangzhou 310027, Peoples R China
关键词
Three-dimensional displays; Feature extraction; Transformers; Convolutional neural networks; Visualization; Fuses; Deep learning; 3D object retrieval; multi-range view aggregation; multi-head self-attention; feature fusion; SIMILARITY; DIFFUSION;
D O I
10.1109/TMM.2023.3246229
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
View-based methods have achieved state-of-the-art performance in 3D object retrieval. However, view-based methods still encounter two major challenges. The first is how to leverage the inter-view correlation to enhance view-level visual features. The second is how to effectively fuse view-level features into a discriminative global descriptor. Towards these two challenges, we propose a multi-range view aggregation network (MRVA-Net) with a vision transformer based feature fusion scheme for 3D object retrieval. Unlike the existing methods which only consider aggregating neighboring or adjacent views which could bring in redundant information, we propose a multi-range view aggregation module to enhance individual view representations through view aggregation beyond only neighboring views but also incorporate the views at different ranges. Furthermore, to generate the global descriptor from view-level features, we propose to employ the multi-head self-attention mechanism introduced by vision transformer to fuse the view-level features. Extensive experiments conducted on three public datasets including ModelNet40, ShapeNet Core55 and MCB-A demonstrate the superiority of the proposed network over the state-of-the-art methods in 3D object retrieval.
引用
收藏
页码:9108 / 9119
页数:12
相关论文
共 50 条
  • [31] Transformer-Based Optimized Multimodal Fusion for 3D Object Detection in Autonomous Driving
    Alaba, Simegnew Yihunie
    Ball, John E.
    IEEE ACCESS, 2024, 12 : 50165 - 50176
  • [32] Exploring Point-BEV Fusion for 3D Point Cloud Object Tracking With Transformer
    Luo, Zhipeng
    Zhou, Changqing
    Pan, Liang
    Zhang, Gongjie
    Liu, Tianrui
    Luo, Yueru
    Zhao, Haiyu
    Liu, Ziwei
    Lu, Shijian
    IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2024, 46 (09) : 5921 - 5935
  • [33] End-to-End 3D Human Pose Estimation Network With Multi-Layer Feature Fusion
    Cai, Guoci
    Zhang, Changshe
    Xie, Jingxiu
    Pan, Jie
    Li, Chaopeng
    Wu, Yiliang
    IEEE ACCESS, 2024, 12 : 89124 - 89134
  • [34] Multi-view dual attention network for 3D object recognition
    Wenju Wang
    Yu Cai
    Tao Wang
    Neural Computing and Applications, 2022, 34 : 3201 - 3212
  • [35] Multi-view dual attention network for 3D object recognition
    Wang, Wenju
    Cai, Yu
    Wang, Tao
    NEURAL COMPUTING & APPLICATIONS, 2022, 34 (04) : 3201 - 3212
  • [36] A region feature fusion network for point cloud and image to detect 3D object
    Shi, Yanjun
    Ma, Longfei
    Li, Jiajian
    Wang, Xiaocong
    Yang, Yu
    IET COLLABORATIVE INTELLIGENT MANUFACTURING, 2024, 6 (02)
  • [37] X-View: Non-Egocentric Multi-View 3D Object Detector
    Xie, Liang
    Xu, Guodong
    Cai, Deng
    He, Xiaofei
    IEEE TRANSACTIONS ON IMAGE PROCESSING, 2023, 32 : 1488 - 1497
  • [38] Multi-view depth estimation based on multi-feature aggregation for 3D reconstruction
    Zhang, Chi
    Liang, Lingyu
    Zhou, Jijun
    Xu, Yong
    COMPUTERS & GRAPHICS-UK, 2024, 122
  • [39] LiDAR-Camera Fusion in Perspective View for 3D Object Detection in Surface Mine
    Ai, Yunfeng
    Yang, Xue
    Song, Ruiqi
    Cui, Chenglin
    Li, Xinqing
    Cheng, Qi
    Tian, Bin
    Chen, Long
    IEEE TRANSACTIONS ON INTELLIGENT VEHICLES, 2024, 9 (02): : 3721 - 3730
  • [40] Multi-View Vision Fusion Network: Can 2D Pre-Trained Model Boost 3D Point Cloud Data-Scarce Learning?
    Peng, Haoyang
    Li, Baopu
    Zhang, Bo
    Chen, Xin
    Chen, Tao
    Zhu, Hongyuan
    IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2024, 34 (07) : 5951 - 5962