Panoramic Vision Transformer for Saliency Detection in 360° Videos

被引：11

作者：

Yun, Heeseung ^{[1
]}

Lee, Sehun ^{[1
]}

Kim, Gunhee ^{[1
]}

机构：

[1] Seoul Natl Univ, Seoul, South Korea

来源：

COMPUTER VISION - ECCV 2022, PT XXXV | 2022年 / 13695卷

关键词：

360 degrees videos; Saliency detection; Vision transformer;

D O I：

10.1007/978-3-031-19833-5_25

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

360 degrees video saliency detection is one of the challenging benchmarks for 360 degrees video understanding since non-negligible distortion and discontinuity occur in the projection of any format of 360 degrees videos, and capture-worthy viewpoint in the omnidirectional sphere is ambiguous by nature. We present a new framework named Panoramic Vision Transformer (PAVER). We design the encoder using Vision Transformer with deformable convolution, which enables us not only to plug pretrained models from normal videos into our architecture without additional modules or finetuning but also to perform geometric approximation only once, unlike previous deep CNN-based approaches. Thanks to its powerful encoder, PAVER can learn the saliency from three simple relative relations among local patch features, outperforming state-of-the-art models for the Wild360 benchmark by large margins without supervision or auxiliary information like class activation. We demonstrate the utility of our saliency prediction model with the omnidirectional video quality assessment task in VQA-ODV, where we consistently improve performance without any form of supervision, including head movement.

引用

页码：422 / 439

页数：18

共 50 条

[41] DeepFake detection with multi-scale convolution and vision transformer
Lin, Hao
Huang, Wenmin
Luo, Weiqi
Lu, Wei
DIGITAL SIGNAL PROCESSING, 2023, 134
[42] COVID-Transformer: Interpretable COVID-19 Detection Using Vision Transformer for Healthcare
Shome, Debaditya
Kar, T.
Mohanty, Sachi Nandan
Tiwari, Prayag
Muhammad, Khan
AlTameem, Abdullah
Zhang, Yazhou
Saudagar, Abdul Khader Jilani
INTERNATIONAL JOURNAL OF ENVIRONMENTAL RESEARCH AND PUBLIC HEALTH, 2021, 18 (21)
[43] Drosophila-Vision-Inspired Motion Perception Model and Its Application in Saliency Detection
Chen, Zhe
Mu, Qi
Han, Guangjie
Lu, Huimin
IEEE TRANSACTIONS ON CONSUMER ELECTRONICS, 2024, 70 (01) : 819 - 830
[44] Visual perception enhancement fall detection algorithm based on vision transformer
Cai, Xi
Wang, Xiangcheng
Bao, Kexin
Chen, Yinuo
Jiao, Yin
Han, Guang
SIGNAL IMAGE AND VIDEO PROCESSING, 2025, 19 (01)
[45] Shifted-Window Hierarchical Vision Transformer for Distracted Driver Detection
Koay, Hong Vin
Chuah, Joon Huang
Chow, Chee-Onn
2021 IEEE REGION 10 SYMPOSIUM (TENSYMP), 2021,
[46] Saliency Tree: A Novel Saliency Detection Framework
Liu, Zhi
Zou, Wenbin
Le Meur, Olivier
IEEE TRANSACTIONS ON IMAGE PROCESSING, 2014, 23 (05) : 1937 - 1952
[47] Optimized vision transformer encoder with cnn for automatic psoriasis disease detection
Vishwakarma, Gagan
Nandanwar, Amit Kumar
Thakur, Ghanshyam Singh
MULTIMEDIA TOOLS AND APPLICATIONS, 2023, 83 (21) : 59597 - 59616
[48] Comparison of Eye-gaze Detection using CNN and Vision Transformer
Niikura D.
Abe K.
IEEJ Transactions on Electronics, Information and Systems, 2024, 144 (07) : 683 - 684
[49] CrimeNet: Neural Structured Learning using Vision Transformer for violence detection
Rendon-Segador, Fernando J.
Alvarez-Garcia, Juan A.
Salazar-Gonzalez, Jose L.
Tommasi, Tatiana
NEURAL NETWORKS, 2023, 161 : 318 - 329
[50] SViT: A Spectral Vision Transformer for the Detection of REM Sleep Behavior Disorder
Gunter, Katarina Mary
Brink-Kjaer, Andreas
Mignot, Emmanuel
Sorensen, Helge B. D.
During, Emmanuel
Jennum, Poul
IEEE JOURNAL OF BIOMEDICAL AND HEALTH INFORMATICS, 2023, 27 (09) : 4285 - 4292

← 1 2 3 4 5 →