Panoramic Vision Transformer for Saliency Detection in 360° Videos

被引:11
作者
Yun, Heeseung [1 ]
Lee, Sehun [1 ]
Kim, Gunhee [1 ]
机构
[1] Seoul Natl Univ, Seoul, South Korea
来源
COMPUTER VISION - ECCV 2022, PT XXXV | 2022年 / 13695卷
关键词
360 degrees videos; Saliency detection; Vision transformer;
D O I
10.1007/978-3-031-19833-5_25
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
360 degrees video saliency detection is one of the challenging benchmarks for 360 degrees video understanding since non-negligible distortion and discontinuity occur in the projection of any format of 360 degrees videos, and capture-worthy viewpoint in the omnidirectional sphere is ambiguous by nature. We present a new framework named Panoramic Vision Transformer (PAVER). We design the encoder using Vision Transformer with deformable convolution, which enables us not only to plug pretrained models from normal videos into our architecture without additional modules or finetuning but also to perform geometric approximation only once, unlike previous deep CNN-based approaches. Thanks to its powerful encoder, PAVER can learn the saliency from three simple relative relations among local patch features, outperforming state-of-the-art models for the Wild360 benchmark by large margins without supervision or auxiliary information like class activation. We demonstrate the utility of our saliency prediction model with the omnidirectional video quality assessment task in VQA-ODV, where we consistently improve performance without any form of supervision, including head movement.
引用
收藏
页码:422 / 439
页数:18
相关论文
共 50 条
  • [31] Fabric defect detection via saliency model based on adjacent context coordination and transformer
    Yang, Ruimin
    Guo, Na
    Tian, Bo
    Wang, Junpu
    Liu, Shanliang
    Yu, Miao
    JOURNAL OF ENGINEERED FIBERS AND FABRICS, 2024, 19
  • [32] Pedestrian Head Detection and Tracking via Global Vision Transformer
    Xuan-Thuy Vo
    Van-Dung Hoang
    Duy-Linh Nguyen
    Kang-Hyun Jo
    FRONTIERS OF COMPUTER VISION (IW-FCV 2022), 2022, 1578 : 155 - 167
  • [33] An Intrusion Detection System Using Vision Transformer for Representation Learning
    Ban, Xinbo
    Liu, Ao
    He, Long
    Gong, Li
    FRONTIERS IN CYBER SECURITY, FCS 2023, 2024, 1992 : 531 - 544
  • [34] DeepCPD: deep learning with vision transformer for colorectal polyp detection
    Raseena, T. P.
    Kumar, Jitendra
    Balasundaram, S. R.
    MULTIMEDIA TOOLS AND APPLICATIONS, 2024, 83 (32) : 78183 - 78206
  • [35] Improved Deepfake Video Detection Using Convolutional Vision Transformer
    Deressa, Deressa Wodajo
    Lambert, Peter
    Van Wallendael, Glenn
    Atnafu, Solomon
    Mareen, Hannes
    2024 IEEE GAMING, ENTERTAINMENT, AND MEDIA CONFERENCE, GEM 2024, 2024, : 492 - 497
  • [36] Explainable Anomaly Detection Using Vision Transformer Based SVDD
    Baek, Ji-Won
    Chung, Kyungyong
    CMC-COMPUTERS MATERIALS & CONTINUA, 2023, 74 (03): : 6573 - 6586
  • [37] Unmasking Deception: Empowering Deepfake Detection with Vision Transformer Network
    Arshed, Muhammad Asad
    Alwadain, Ayed
    Ali, Rao Faizan
    Mumtaz, Shahzad
    Ibrahim, Muhammad
    Muneer, Amgad
    MATHEMATICS, 2023, 11 (17)
  • [38] Fault detection of catenary hanger based on EfficientDet and Vision Transformer
    Bian J.
    Xue X.
    Cui Y.
    Xu H.
    Lu Y.
    Journal of Railway Science and Engineering, 2023, 20 (06) : 2340 - 2349
  • [39] Spherical DNNs and Their Applications in 360° Images and Videos
    Xu, Yanyu
    Zhang, Ziheng
    Gao, Shenghua
    IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2022, 44 (10) : 7235 - 7252
  • [40] Spherical Convolution-based Saliency Detection for FoV Prediction in 360-degree Video Streaming
    Peng, Shuai
    Hu, Jialu
    Li, Zitong
    Xiao, Han
    Yang, Shujie
    Xu, Changqiao
    2023 INTERNATIONAL WIRELESS COMMUNICATIONS AND MOBILE COMPUTING, IWCMC, 2023, : 162 - 167