Multi-Range View Aggregation Network With Vision Transformer Feature Fusion for 3D Object Retrieval

被引:7
|
作者
Lin, Dongyun [1 ]
Li, Yiqun [1 ]
Cheng, Yi [1 ]
Prasad, Shitala [1 ]
Guo, Aiyuan [1 ]
Cao, Yanpeng [2 ]
机构
[1] ASTAR, Inst Infocomm Res I2R, Singapore 138632, Singapore
[2] Zhejiang Univ, State Key Lab Fluid Power & Mechatron Syst, Hangzhou 310027, Peoples R China
关键词
Three-dimensional displays; Feature extraction; Transformers; Convolutional neural networks; Visualization; Fuses; Deep learning; 3D object retrieval; multi-range view aggregation; multi-head self-attention; feature fusion; SIMILARITY; DIFFUSION;
D O I
10.1109/TMM.2023.3246229
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
View-based methods have achieved state-of-the-art performance in 3D object retrieval. However, view-based methods still encounter two major challenges. The first is how to leverage the inter-view correlation to enhance view-level visual features. The second is how to effectively fuse view-level features into a discriminative global descriptor. Towards these two challenges, we propose a multi-range view aggregation network (MRVA-Net) with a vision transformer based feature fusion scheme for 3D object retrieval. Unlike the existing methods which only consider aggregating neighboring or adjacent views which could bring in redundant information, we propose a multi-range view aggregation module to enhance individual view representations through view aggregation beyond only neighboring views but also incorporate the views at different ranges. Furthermore, to generate the global descriptor from view-level features, we propose to employ the multi-head self-attention mechanism introduced by vision transformer to fuse the view-level features. Extensive experiments conducted on three public datasets including ModelNet40, ShapeNet Core55 and MCB-A demonstrate the superiority of the proposed network over the state-of-the-art methods in 3D object retrieval.
引用
收藏
页码:9108 / 9119
页数:12
相关论文
共 50 条
  • [41] DRCNN: Dynamic Routing Convolutional Neural Network for Multi-View 3D Object Recognition
    Sun, Kai
    Zhang, Jiangshe
    Liu, Junmin
    Yu, Ruixuan
    Song, Zengjie
    IEEE TRANSACTIONS ON IMAGE PROCESSING, 2021, 30 : 868 - 877
  • [42] Sketch-based 3D Model Retrieval via Multi-feature Fusion
    Wen, Yafei
    Zou, Changqing
    Liu, Jianzhuang
    Du, Shuze
    Chen, Shifeng
    2014 22ND INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION (ICPR), 2014, : 4570 - 4575
  • [43] 2DSlicesNet: A 2D Slice-Based Convolutional Neural Network for 3D Object Retrieval and Classification
    Taybi, Ilyass Ouazzani
    Gadi, Taoufiq
    Alaoui, Rachid
    IEEE ACCESS, 2021, 9 : 24041 - 24049
  • [44] M3DGAF: Monocular 3D Object Detection With Geometric Appearance Awareness and Feature Fusion
    Chen, Mu
    Liu, Pengfei
    Zhao, Huaici
    IEEE SENSORS JOURNAL, 2023, 23 (11) : 11232 - 11240
  • [45] 3D Object Detection With Multi-Frame RGB-Lidar Feature Alignment
    Ercelik, Emec
    Yurtsever, Ekim
    Knoll, Alois
    IEEE ACCESS, 2021, 9 : 143138 - 143149
  • [46] Height-Adaptive Deformable Multi-Modal Fusion for 3D Object Detection
    Li, Jiahao
    Chen, Lingshan
    Li, Zhen
    IEEE ACCESS, 2025, 13 : 52385 - 52396
  • [47] Joint Heterogeneous Feature Learning and Distribution Alignment for 2D Image-Based 3D Object Retrieval
    Su, Yuting
    Li, Yuqian
    Nie, Weizhi
    Song, Dan
    Liu, An-An
    IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2020, 30 (10) : 3765 - 3776
  • [48] Multi-Modal Fusion Based on Depth Adaptive Mechanism for 3D Object Detection
    Liu, Zhanwen
    Cheng, Juanru
    Fan, Jin
    Lin, Shan
    Wang, Yang
    Zhao, Xiangmo
    IEEE TRANSACTIONS ON MULTIMEDIA, 2025, 27 : 707 - 717
  • [49] Group-pair deep feature learning for multi-view 3d model retrieval
    Chen, Xiuxiu
    Liu, Li
    Zhang, Long
    Zhang, Huaxiang
    Meng, Lili
    Liu, Dongmei
    APPLIED INTELLIGENCE, 2022, 52 (02) : 2013 - 2022
  • [50] View-based 3D object retrieval via multi-modal graph learning
    Zhao, Sicheng
    Yao, Hongxun
    Zhang, Yanhao
    Wang, Yasi
    Liu, Shaohui
    SIGNAL PROCESSING, 2015, 112 : 110 - 118