Local Selective Vision Transformer for Depth Estimation Using a Compound Eye Camera

被引:8
作者
Oh, Wooseok [1 ]
Yoo, Hwiyeon [1 ]
Ha, Taeoh [1 ]
Oh, Songhwai [1 ]
机构
[1] Seoul Natl Univ, ASRI, Dept Elect & Comp Engn, Seoul 08826, South Korea
关键词
Compound Eye; Depth Estimation; Vision Transformer;
D O I
10.1016/j.patrec.2023.02.010
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
A compound eye camera is a hemispherical camera made by mimicking the structure of an insect's eye. In general, a compound eye camera is composed of a set of single eye cameras. The compound eye cam-era has various advantages due to its unique structure and can be used in various vision tasks. In order to apply the compound eye camera to various vision tasks using 3D information, depth estimation is required. However, due to the difference between the compound eye image and the 2D RGB image, it is hard to use the existing depth estimation methods directly. In this paper, we propose a transformer-based neural network for eye-wise depth estimation, which is suitable for the compound eye image. We modify the self-attention module with local selective self-attention to take advantage of the compound eye's hemispherical structure. In addition, we reduce the computational amount and increase the per-formance through the eye selection module. Using the proposed local selective self-attention and eye selection modules, we are able to improve the performance without large-scale pre-training. Compared to the ResNet-based depth estimation network, our method showed 2.8% and 1.4% higher performance on the GAZEBO and Matterport3D datasets, respectively, with 15.3% fewer network parameters.(c) 2023 Elsevier B.V. All rights reserved.
引用
收藏
页码:82 / 89
页数:8
相关论文
共 28 条
[1]  
Agarwal A, 2023, P IEEECVF WINTER C A
[2]  
Cha G., 2017, IEEE INT C MULTISENS
[3]   Matterport3D: Learning from RGB-D Data in Indoor Environments [J].
Chang, Angel ;
Dai, Angela ;
Funkhouser, Thomas ;
Halber, Maciej ;
Niessner, Matthias ;
Savva, Manolis ;
Song, Shuran ;
Zeng, Andy ;
Zhang, Yinda .
PROCEEDINGS 2017 INTERNATIONAL CONFERENCE ON 3D VISION (3DV), 2017, :667-676
[4]   An Insect Eye Inspired Miniaturized Multi-Camera System for Endoscopic Imaging [J].
Cogal, Omer ;
Leblebici, Yusuf .
IEEE TRANSACTIONS ON BIOMEDICAL CIRCUITS AND SYSTEMS, 2017, 11 (01) :212-224
[5]  
Dosovitskiy Alexey, 2021, INT C LEARNING REPRE
[6]  
Eigen D, 2014, ADV NEUR IN, V27
[7]   Deep Residual Learning for Image Recognition [J].
He, Kaiming ;
Zhang, Xiangyu ;
Ren, Shaoqing ;
Sun, Jian .
2016 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2016, :770-778
[8]   Revisiting Single Image Depth Estimation: Toward Higher Resolution Maps with Accurate Object Boundaries [J].
Hu, Junjie ;
Ozay, Mete ;
Zhang, Yan ;
Okatani, Takayuki .
2019 IEEE WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION (WACV), 2019, :1043-1051
[9]  
Izadi S., 2011, P 24 ANN ACM S USER
[10]  
Koenig N., 2004, 2004 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) (IEEE Cat. No.04CH37566), P2149