Panoramic Vision Transformer for Saliency Detection in 360° Videos

被引：11

作者：

Yun, Heeseung ^{[1
]}

Lee, Sehun ^{[1
]}

Kim, Gunhee ^{[1
]}

机构：

[1] Seoul Natl Univ, Seoul, South Korea

来源：

COMPUTER VISION - ECCV 2022, PT XXXV | 2022年 / 13695卷

关键词：

360 degrees videos; Saliency detection; Vision transformer;

D O I：

10.1007/978-3-031-19833-5_25

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

360 degrees video saliency detection is one of the challenging benchmarks for 360 degrees video understanding since non-negligible distortion and discontinuity occur in the projection of any format of 360 degrees videos, and capture-worthy viewpoint in the omnidirectional sphere is ambiguous by nature. We present a new framework named Panoramic Vision Transformer (PAVER). We design the encoder using Vision Transformer with deformable convolution, which enables us not only to plug pretrained models from normal videos into our architecture without additional modules or finetuning but also to perform geometric approximation only once, unlike previous deep CNN-based approaches. Thanks to its powerful encoder, PAVER can learn the saliency from three simple relative relations among local patch features, outperforming state-of-the-art models for the Wild360 benchmark by large margins without supervision or auxiliary information like class activation. We demonstrate the utility of our saliency prediction model with the omnidirectional video quality assessment task in VQA-ODV, where we consistently improve performance without any form of supervision, including head movement.

引用

页码：422 / 439

页数：18

共 50 条

[21] DeepFake detection algorithm based on improved vision transformer
Young-Jin Heo
Woon-Ha Yeo
Byung-Gyu Kim
Applied Intelligence, 2023, 53 : 7512 - 7527
[22] A new deep spatial transformer convolutional neural network for image saliency detection
Xinsheng Zhang
Teng Gao
Dongdong Gao
Design Automation for Embedded Systems, 2018, 22 : 243 - 256
[23] Hyperspectral anomaly detection with vision transformer and adversarial refinement
Xu, Yating
Zhao, Kai
Zhang, Liangang
Zhu, Mengyao
Zeng, Dan
INTERNATIONAL JOURNAL OF REMOTE SENSING, 2023, 44 (13) : 4034 - 4057
[24] Efficient deepfake detection using shallow vision transformer
Shaheen Usmani
Sunil Kumar
Debanjan Sadhya
Multimedia Tools and Applications, 2024, 83 : 12339 - 12362
[25] Fire detection using vision transformer on power plant
Zhang, Kaidi
Wang, Binjun
Tong, Xin
Liu, Keke
ENERGY REPORTS, 2022, 8 : 657 - 664
[26] Efficient deepfake detection using shallow vision transformer
Usmani, Shaheen
Kumar, Sunil
Sadhya, Debanjan
MULTIMEDIA TOOLS AND APPLICATIONS, 2024, 83 (04) : 12339 - 12362
[27] Encoding laparoscopic image to words using vision transformer for distortion classification and ranking in laparoscopic videos
AlDahoul N.
Karim H.A.
Momo M.A.
Tan M.J.T.
Fermin J.L.
Multimedia Tools and Applications, 2025, 84 (10) : 7159 - 7181
[28] Subtitle Positioning for E-learning Videos Based on Rough Gaze Estimation and Saliency Detection
Jiang, Bo
Liu, Sijiang
He, Liping
Wu, Weimin
Chen, Hongli
Shen, Yunfei
SIGGRAPH ASIA 2017 POSTERS (SA'17), 2017,
[29] Attention enhanced machine instinctive vision with human-inspired saliency detection
Khan, Habib
Usman, Muhammad Talha
Rida, Imad
Koo, Jakeoung
IMAGE AND VISION COMPUTING, 2024, 152
[30] SDRSwin: A Residual Swin Transformer Network with Saliency Detection for Infrared and Visible Image Fusion
Li, Shengshi
Wang, Guanjun
Zhang, Hui
Zou, Yonghua
REMOTE SENSING, 2023, 15 (18)

← 1 2 3 4 5 →