Panoramic Vision Transformer for Saliency Detection in 360° Videos

被引:11
作者
Yun, Heeseung [1 ]
Lee, Sehun [1 ]
Kim, Gunhee [1 ]
机构
[1] Seoul Natl Univ, Seoul, South Korea
来源
COMPUTER VISION - ECCV 2022, PT XXXV | 2022年 / 13695卷
关键词
360 degrees videos; Saliency detection; Vision transformer;
D O I
10.1007/978-3-031-19833-5_25
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
360 degrees video saliency detection is one of the challenging benchmarks for 360 degrees video understanding since non-negligible distortion and discontinuity occur in the projection of any format of 360 degrees videos, and capture-worthy viewpoint in the omnidirectional sphere is ambiguous by nature. We present a new framework named Panoramic Vision Transformer (PAVER). We design the encoder using Vision Transformer with deformable convolution, which enables us not only to plug pretrained models from normal videos into our architecture without additional modules or finetuning but also to perform geometric approximation only once, unlike previous deep CNN-based approaches. Thanks to its powerful encoder, PAVER can learn the saliency from three simple relative relations among local patch features, outperforming state-of-the-art models for the Wild360 benchmark by large margins without supervision or auxiliary information like class activation. We demonstrate the utility of our saliency prediction model with the omnidirectional video quality assessment task in VQA-ODV, where we consistently improve performance without any form of supervision, including head movement.
引用
收藏
页码:422 / 439
页数:18
相关论文
共 50 条
  • [21] DeepFake detection algorithm based on improved vision transformer
    Young-Jin Heo
    Woon-Ha Yeo
    Byung-Gyu Kim
    Applied Intelligence, 2023, 53 : 7512 - 7527
  • [22] A new deep spatial transformer convolutional neural network for image saliency detection
    Xinsheng Zhang
    Teng Gao
    Dongdong Gao
    Design Automation for Embedded Systems, 2018, 22 : 243 - 256
  • [23] Hyperspectral anomaly detection with vision transformer and adversarial refinement
    Xu, Yating
    Zhao, Kai
    Zhang, Liangang
    Zhu, Mengyao
    Zeng, Dan
    INTERNATIONAL JOURNAL OF REMOTE SENSING, 2023, 44 (13) : 4034 - 4057
  • [24] Efficient deepfake detection using shallow vision transformer
    Shaheen Usmani
    Sunil Kumar
    Debanjan Sadhya
    Multimedia Tools and Applications, 2024, 83 : 12339 - 12362
  • [25] Fire detection using vision transformer on power plant
    Zhang, Kaidi
    Wang, Binjun
    Tong, Xin
    Liu, Keke
    ENERGY REPORTS, 2022, 8 : 657 - 664
  • [26] Efficient deepfake detection using shallow vision transformer
    Usmani, Shaheen
    Kumar, Sunil
    Sadhya, Debanjan
    MULTIMEDIA TOOLS AND APPLICATIONS, 2024, 83 (04) : 12339 - 12362
  • [27] Encoding laparoscopic image to words using vision transformer for distortion classification and ranking in laparoscopic videos
    AlDahoul N.
    Karim H.A.
    Momo M.A.
    Tan M.J.T.
    Fermin J.L.
    Multimedia Tools and Applications, 2025, 84 (10) : 7159 - 7181
  • [28] Subtitle Positioning for E-learning Videos Based on Rough Gaze Estimation and Saliency Detection
    Jiang, Bo
    Liu, Sijiang
    He, Liping
    Wu, Weimin
    Chen, Hongli
    Shen, Yunfei
    SIGGRAPH ASIA 2017 POSTERS (SA'17), 2017,
  • [29] Attention enhanced machine instinctive vision with human-inspired saliency detection
    Khan, Habib
    Usman, Muhammad Talha
    Rida, Imad
    Koo, Jakeoung
    IMAGE AND VISION COMPUTING, 2024, 152
  • [30] SDRSwin: A Residual Swin Transformer Network with Saliency Detection for Infrared and Visible Image Fusion
    Li, Shengshi
    Wang, Guanjun
    Zhang, Hui
    Zou, Yonghua
    REMOTE SENSING, 2023, 15 (18)