Heterogeneous Feature Fusion Module Based on CNN and Transformer for Multiview Stereo Reconstruction

被引:4
|
作者
Gao, Rui [1 ]
Xu, Jiajia [1 ]
Chen, Yipeng [2 ]
Cho, Kyungeun [1 ]
机构
[1] Dongguk Univ Seoul, Dept Multimedia Engn, 30 Pildongro 1 Gil, Seoul 04620, South Korea
[2] Dongguk Univ Seoul, Dept Autonomous Things Intelligence, 30 Pildongro 1 Gil, Seoul 04620, South Korea
基金
新加坡国家研究基金会;
关键词
multi-view stereo; 3D reconstruction; deep learning; transformer;
D O I
10.3390/math11010112
中图分类号
O1 [数学];
学科分类号
0701 ; 070101 ;
摘要
For decades, a vital area of computer vision research has been multiview stereo (MVS), which creates 3D models of a scene using photographs. This study presents an effective MVS network for 3D reconstruction utilizing multiview pictures. Alternative learning-based reconstruction techniques work well, because CNNs (convolutional neural network) can extract only the image's local features; however, they contain many artifacts. Herein, a transformer and CNN are used to extract the global and local features of the image, respectively. Additionally, hierarchical aggregation and heterogeneous interaction modules were used to improve these features. They are based on the transformer and can extract dense features with 3D consistency and global context that are necessary to provide accurate matching for MVS.
引用
收藏
页数:14
相关论文
共 50 条
  • [1] Multiview Stereo Reconstruction with Feature Aggregation Transformer
    Wang Min
    Zhao Mingfu
    Song Tao
    Li Weiwei
    Tian Yuan
    Li Cheng
    Zhang Yu
    LASER & OPTOELECTRONICS PROGRESS, 2024, 61 (14)
  • [2] Spectral Reconstruction for Internet of Things Based on Parallel Fusion of CNN and Transformer
    Sun, Bangyong
    Wu, Changyu
    Yu, Mengying
    IEEE INTERNET OF THINGS JOURNAL, 2025, 12 (04): : 3549 - 3562
  • [3] Object Detection Algorithm Based on CNN-Transformer Dual Modal Feature Fusion
    Yang Chen
    Hou Zhiqiang
    Li Xinyue
    Ma Sugang
    Yang Xiaobao
    ACTA PHOTONICA SINICA, 2024, 53 (03)
  • [4] Hyperspectral Image Classification Based on Interactive Transformer and CNN With Multilevel Feature Fusion Network
    Yang, Hao
    Yu, Haoyang
    Zheng, Ke
    Hu, Jiaochan
    Tao, Tingting
    Zhang, Qiang
    IEEE GEOSCIENCE AND REMOTE SENSING LETTERS, 2023, 20
  • [5] Transformer-based multiview spatiotemporal feature interactive fusion for human action recognition in depth videos
    Wu, Hanbo
    Ma, Xin
    Li, Yibin
    SIGNAL PROCESSING-IMAGE COMMUNICATION, 2025, 131
  • [6] Multiview CNN Model for Sensor Fusion Based Vehicle Detection
    Ouyang, Zhenchao
    Wang, Chunyuan
    Liu, Yu
    Niu, Jianwei
    ADVANCES IN MULTIMEDIA INFORMATION PROCESSING, PT III, 2018, 11166 : 459 - 470
  • [7] Multiview Scene Reconstruction Based on Edge Assisted Epipolar Transformer
    Tong W.
    Zhang M.
    Li D.
    Wu Q.
    Song A.
    Dianzi Yu Xinxi Xuebao/Journal of Electronics and Information Technology, 2023, 45 (10): : 3483 - 3491
  • [8] Multiview stereo reconstruction of UAV remote sensing images based on adaptive propagation with multiregional refinement
    Fu, Haohai
    Nie, Zixuan
    Pan, Xin
    SCIENTIFIC REPORTS, 2025, 15 (01):
  • [9] BGCFormer: A Text Event Feature Fusion Learning Model based on Transformer
    Liu, Yulong
    Wang, Juan
    Li, Qian
    2023 8TH INTERNATIONAL CONFERENCE ON CLOUD COMPUTING AND BIG DATA ANALYTICS, ICCCBDA, 2023, : 157 - 161
  • [10] EEG classification algorithm of motor imagery based on CNN-Transformer fusion network
    Liu, Haofeng
    Liu, Yuefeng
    Wang, Yue
    Liu, Bo
    Bao, Xiang
    2022 IEEE INTERNATIONAL CONFERENCE ON TRUST, SECURITY AND PRIVACY IN COMPUTING AND COMMUNICATIONS, TRUSTCOM, 2022, : 1302 - 1309