Panoramic Vision Transformer for Saliency Detection in 360° Videos

被引：11

作者：

Yun, Heeseung ^{[1
]}

Lee, Sehun ^{[1
]}

Kim, Gunhee ^{[1
]}

机构：

[1] Seoul Natl Univ, Seoul, South Korea

来源：

COMPUTER VISION - ECCV 2022, PT XXXV | 2022年 / 13695卷

关键词：

360 degrees videos; Saliency detection; Vision transformer;

D O I：

10.1007/978-3-031-19833-5_25

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

360 degrees video saliency detection is one of the challenging benchmarks for 360 degrees video understanding since non-negligible distortion and discontinuity occur in the projection of any format of 360 degrees videos, and capture-worthy viewpoint in the omnidirectional sphere is ambiguous by nature. We present a new framework named Panoramic Vision Transformer (PAVER). We design the encoder using Vision Transformer with deformable convolution, which enables us not only to plug pretrained models from normal videos into our architecture without additional modules or finetuning but also to perform geometric approximation only once, unlike previous deep CNN-based approaches. Thanks to its powerful encoder, PAVER can learn the saliency from three simple relative relations among local patch features, outperforming state-of-the-art models for the Wild360 benchmark by large margins without supervision or auxiliary information like class activation. We demonstrate the utility of our saliency prediction model with the omnidirectional video quality assessment task in VQA-ODV, where we consistently improve performance without any form of supervision, including head movement.

引用

页码：422 / 439

页数：18

共 50 条

[31] Fabric defect detection via saliency model based on adjacent context coordination and transformer
Yang, Ruimin
Guo, Na
Tian, Bo
Wang, Junpu
Liu, Shanliang
Yu, Miao
JOURNAL OF ENGINEERED FIBERS AND FABRICS, 2024, 19
[32] Pedestrian Head Detection and Tracking via Global Vision Transformer
Xuan-Thuy Vo
Van-Dung Hoang
Duy-Linh Nguyen
Kang-Hyun Jo
FRONTIERS OF COMPUTER VISION (IW-FCV 2022), 2022, 1578 : 155 - 167
[33] An Intrusion Detection System Using Vision Transformer for Representation Learning
Ban, Xinbo
Liu, Ao
He, Long
Gong, Li
FRONTIERS IN CYBER SECURITY, FCS 2023, 2024, 1992 : 531 - 544
[34] DeepCPD: deep learning with vision transformer for colorectal polyp detection
Raseena, T. P.
Kumar, Jitendra
Balasundaram, S. R.
MULTIMEDIA TOOLS AND APPLICATIONS, 2024, 83 (32) : 78183 - 78206
[35] Improved Deepfake Video Detection Using Convolutional Vision Transformer
Deressa, Deressa Wodajo
Lambert, Peter
Van Wallendael, Glenn
Atnafu, Solomon
Mareen, Hannes
2024 IEEE GAMING, ENTERTAINMENT, AND MEDIA CONFERENCE, GEM 2024, 2024, : 492 - 497
[36] Explainable Anomaly Detection Using Vision Transformer Based SVDD
Baek, Ji-Won
Chung, Kyungyong
CMC-COMPUTERS MATERIALS & CONTINUA, 2023, 74 (03): : 6573 - 6586
[37] Unmasking Deception: Empowering Deepfake Detection with Vision Transformer Network
Arshed, Muhammad Asad
Alwadain, Ayed
Ali, Rao Faizan
Mumtaz, Shahzad
Ibrahim, Muhammad
Muneer, Amgad
MATHEMATICS, 2023, 11 (17)
[38] Fault detection of catenary hanger based on EfficientDet and Vision Transformer
Bian J.
Xue X.
Cui Y.
Xu H.
Lu Y.
Journal of Railway Science and Engineering, 2023, 20 (06) : 2340 - 2349
[39] Spherical DNNs and Their Applications in 360° Images and Videos
Xu, Yanyu
Zhang, Ziheng
Gao, Shenghua
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2022, 44 (10) : 7235 - 7252
[40] Spherical Convolution-based Saliency Detection for FoV Prediction in 360-degree Video Streaming
Peng, Shuai
Hu, Jialu
Li, Zitong
Xiao, Han
Yang, Shujie
Xu, Changqiao
2023 INTERNATIONAL WIRELESS COMMUNICATIONS AND MOBILE COMPUTING, IWCMC, 2023, : 162 - 167

← 1 2 3 4 5 →