Panoramic Vision Transformer for Saliency Detection in 360° Videos

被引:11
作者
Yun, Heeseung [1 ]
Lee, Sehun [1 ]
Kim, Gunhee [1 ]
机构
[1] Seoul Natl Univ, Seoul, South Korea
来源
COMPUTER VISION - ECCV 2022, PT XXXV | 2022年 / 13695卷
关键词
360 degrees videos; Saliency detection; Vision transformer;
D O I
10.1007/978-3-031-19833-5_25
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
360 degrees video saliency detection is one of the challenging benchmarks for 360 degrees video understanding since non-negligible distortion and discontinuity occur in the projection of any format of 360 degrees videos, and capture-worthy viewpoint in the omnidirectional sphere is ambiguous by nature. We present a new framework named Panoramic Vision Transformer (PAVER). We design the encoder using Vision Transformer with deformable convolution, which enables us not only to plug pretrained models from normal videos into our architecture without additional modules or finetuning but also to perform geometric approximation only once, unlike previous deep CNN-based approaches. Thanks to its powerful encoder, PAVER can learn the saliency from three simple relative relations among local patch features, outperforming state-of-the-art models for the Wild360 benchmark by large margins without supervision or auxiliary information like class activation. We demonstrate the utility of our saliency prediction model with the omnidirectional video quality assessment task in VQA-ODV, where we consistently improve performance without any form of supervision, including head movement.
引用
收藏
页码:422 / 439
页数:18
相关论文
共 50 条
  • [41] DeepFake detection with multi-scale convolution and vision transformer
    Lin, Hao
    Huang, Wenmin
    Luo, Weiqi
    Lu, Wei
    DIGITAL SIGNAL PROCESSING, 2023, 134
  • [42] COVID-Transformer: Interpretable COVID-19 Detection Using Vision Transformer for Healthcare
    Shome, Debaditya
    Kar, T.
    Mohanty, Sachi Nandan
    Tiwari, Prayag
    Muhammad, Khan
    AlTameem, Abdullah
    Zhang, Yazhou
    Saudagar, Abdul Khader Jilani
    INTERNATIONAL JOURNAL OF ENVIRONMENTAL RESEARCH AND PUBLIC HEALTH, 2021, 18 (21)
  • [43] Drosophila-Vision-Inspired Motion Perception Model and Its Application in Saliency Detection
    Chen, Zhe
    Mu, Qi
    Han, Guangjie
    Lu, Huimin
    IEEE TRANSACTIONS ON CONSUMER ELECTRONICS, 2024, 70 (01) : 819 - 830
  • [44] Visual perception enhancement fall detection algorithm based on vision transformer
    Cai, Xi
    Wang, Xiangcheng
    Bao, Kexin
    Chen, Yinuo
    Jiao, Yin
    Han, Guang
    SIGNAL IMAGE AND VIDEO PROCESSING, 2025, 19 (01)
  • [45] Shifted-Window Hierarchical Vision Transformer for Distracted Driver Detection
    Koay, Hong Vin
    Chuah, Joon Huang
    Chow, Chee-Onn
    2021 IEEE REGION 10 SYMPOSIUM (TENSYMP), 2021,
  • [46] Saliency Tree: A Novel Saliency Detection Framework
    Liu, Zhi
    Zou, Wenbin
    Le Meur, Olivier
    IEEE TRANSACTIONS ON IMAGE PROCESSING, 2014, 23 (05) : 1937 - 1952
  • [47] Optimized vision transformer encoder with cnn for automatic psoriasis disease detection
    Vishwakarma, Gagan
    Nandanwar, Amit Kumar
    Thakur, Ghanshyam Singh
    MULTIMEDIA TOOLS AND APPLICATIONS, 2023, 83 (21) : 59597 - 59616
  • [48] Comparison of Eye-gaze Detection using CNN and Vision Transformer
    Niikura D.
    Abe K.
    IEEJ Transactions on Electronics, Information and Systems, 2024, 144 (07) : 683 - 684
  • [49] CrimeNet: Neural Structured Learning using Vision Transformer for violence detection
    Rendon-Segador, Fernando J.
    Alvarez-Garcia, Juan A.
    Salazar-Gonzalez, Jose L.
    Tommasi, Tatiana
    NEURAL NETWORKS, 2023, 161 : 318 - 329
  • [50] SViT: A Spectral Vision Transformer for the Detection of REM Sleep Behavior Disorder
    Gunter, Katarina Mary
    Brink-Kjaer, Andreas
    Mignot, Emmanuel
    Sorensen, Helge B. D.
    During, Emmanuel
    Jennum, Poul
    IEEE JOURNAL OF BIOMEDICAL AND HEALTH INFORMATICS, 2023, 27 (09) : 4285 - 4292