Spherical DNNs and Their Applications in 360° Images and Videos

被引:16
作者
Xu, Yanyu [1 ]
Zhang, Ziheng [2 ]
Gao, Shenghua [3 ,4 ]
机构
[1] ASTAR, Inst High Performance Comp IHPC, Singapore 138632, Singapore
[2] AI Prime Co Ltd, Shanghai 200090, Peoples R China
[3] ShanghaiTech Univ, Shanghai Engn Res Ctr Intelligent Vis & Imaging, Shanghai 201210, Peoples R China
[4] ShanghaiTech Univ, Shanghai Engn Res Ctr Energy Efficient & Custom A, Shanghai 201210, Peoples R China
基金
国家重点研发计划;
关键词
Spherical deep neural networks; saliency detection; gaze prediction; 360 degrees videos; SALIENCY; GAZE; PREDICTION; PERCEPTION; EYE;
D O I
10.1109/TPAMI.2021.3100259
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Spherical images or videos, as typical non-euclidean data, are usually stored in the form of 2D panoramas obtained through an equirectangular projection, which is neither equal area nor conformal. The distortion caused by the projection limits the performance of vanilla Deep Neural Networks (DNNs) designed for traditional euclidean data. In this paper, we design a novel Spherical Deep Neural Network (DNN) to deal with the distortion caused by the equirectangular projection. Specifically, we customize a set of components, including a spherical convolution, a spherical pooling, a spherical ConvLSTM cell and a spherical MSE loss, as the replacements of their counterparts in vanilla DNNs for spherical data. The core idea is to change the identical behavior of the conventional operations in vanilla DNNs across different feature patches so that they will be adjusted to the distortion caused by the variance of sampling rate among different feature patches. We demonstrate the effectiveness of our Spherical DNNs for saliency detection and gaze estimation in 360 degrees videos. For saliency detection, we take the temporal coherence of an observer's viewing process into consideration and propose to use a Spherical U-Net and a Spherical ConvLSTM to predict the saliency maps for each frame sequentially. As for gaze prediction, we propose to leverage a Spherical Encoder Module to extract spatial panoramic features, then we combine them with the gaze trajectory feature extracted by an LSTM for future gaze prediction. To facilitate the study of the 360 degrees videos saliency detection, we further construct a large-scale 360 degrees video saliency detection dataset that consists of 104 360 degrees videos viewed by 20+ human subjects. Comprehensive experiments validate the effectiveness of our proposed Spherical DNNs for 360 degrees handwritten digit classification and sport classification, saliency detection and gaze tracking in 360 degrees videos. We also visualize the regions contributing to the classification decisions in our proposed Spherical DNNs via the Grad-CAM technique in the classification task, and the results show that our Spherical DNNs constantly leverage reasonable and important regions for decision making, regardless the large distortions. All codes and dataset are available on https://github.com/svip-lab/SphericalDNNs.
引用
收藏
页码:7235 / 7252
页数:18
相关论文
共 61 条
[1]  
[Anonymous], 1997, Flattening the Earth: Two Thousand Years of Map Projections
[2]   SaltiNet: Scan-path Prediction on 360 Degree Images using Saliency Volumes [J].
Assens, Marc ;
Giro-i-Nieto, Xavier ;
McGuinness, Kevin ;
O'Connor, Noel E. .
2017 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION WORKSHOPS (ICCVW 2017), 2017, :2331-2338
[3]  
Bak C ., 2016, ARXIV
[4]  
Bazzani L., 2017, RECURRENT MIXTURE DE
[5]  
Bengio, 2017, ADV NEURAL INFORM PR, V30, P3433, DOI DOI 10.5555/645753
[6]   Human Pose Estimation with Iterative Error Feedback [J].
Carreira, Joao ;
Agrawal, Pulkit ;
Fragkiadaki, Katerina ;
Malik, Jitendra .
2016 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2016, :4733-4742
[7]  
Chaabouni S, 2016, IEEE IMAGE PROC, P1604, DOI 10.1109/ICIP.2016.7532629
[8]   Cube Padding for Weakly-Supervised Saliency Prediction in 360° Videos [J].
Cheng, Hsien-Tzu ;
Chao, Chun-Hung ;
Dong, Jin-Dong ;
Wen, Hao-Kai ;
Liu, Tyng-Luh ;
Sun, Min .
2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2018, :1420-1429
[9]  
Cohen Taco S., 2018, P INT C LEARN REPR V
[10]   SphereNet: Learning Spherical Representations for Detection and Classification in Omnidirectional Images [J].
Coors, Benjamin ;
Condurache, Alexandru Paul ;
Geiger, Andreas .
COMPUTER VISION - ECCV 2018, PT IX, 2018, 11213 :525-541