TVFormer: Trajectory-guided Visual Quality Assessment on 360° Images with Transformers

被引:14
作者
Yang, Li [1 ]
Xu, Mai [1 ]
Liu, Tie [1 ]
Huo, Liangyu [1 ]
Gao, Xinbo [2 ]
机构
[1] Beihang Univ, Beijing, Peoples R China
[2] Chongqing Univ Post & Telecommun, Chongqing, Peoples R China
来源
PROCEEDINGS OF THE 30TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2022 | 2022年
基金
北京市自然科学基金;
关键词
360 degrees images; BVQA; Head trajectory; Transformer; PREDICTION; MODEL;
D O I
10.1145/3503161.3547748
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
Visual quality assessment (VQA) on 360 degrees images plays an important role in optimizing immersive multimedia systems. Due to the absence of pristine 360 degrees images in real world, blind VQA (BVQA) on 360 degrees images has drawn much research attention. In subjective VQA on 360 degrees images, human intuitively make the quality-scoring decisions through the quality degradation of each observed viewport on the head trajectories. Unfortunately, the existing BVQA works for 360 degrees images neglect the dynamic property of head trajectories with viewport interactions, thus failing to obtain human-like quality scores. In this paper, we propose a novel Transformer-based approach for trajectory-guided VQA on 360 degrees images (named TVFormer), in which both the tasks of head trajectory prediction and BVQA can be accomplished for 360 degrees images. In the first task, we develop a trajectory-aware memory updater (TMU) module, for maintaining the coherence and accuracy of predicted head trajectories. To capture the long-range quality dependency across time-ordered viewports, we propose a spatio-temporal factorized self-attention (STF) module in the encoder of TVFormer for the BVQA task. By implanting the predicted head trajectories into the BVQA task, we can obtain the human-like quality scores. Extensive experiments demonstrate the superior BVQA performance of TVFormer over state-of-the-art approaches on three benchmark datasets.
引用
收藏
页数:10
相关论文
共 55 条
[1]   Scanpath and saliency prediction on 360 degree images [J].
Assens, Marc ;
Giro-i-Nieto, Xavier ;
McGuinness, Kevin ;
O'Connor, Noel E. .
SIGNAL PROCESSING-IMAGE COMMUNICATION, 2018, 69 :8-14
[2]  
Assens Marc, 2018, WORKSHOPS P EUROPEAN, P0
[3]   Modelling gaze shift as a constrained random walk [J].
Boccignone, G ;
Ferraro, M .
PHYSICA A-STATISTICAL MECHANICS AND ITS APPLICATIONS, 2004, 331 (1-2) :207-218
[4]   LAU-Net: Latitude Adaptive Upscaling Network for Omnidirectional Image Super-resolution [J].
Deng, Xin ;
Wang, Hao ;
Xu, Mai ;
Guo, Yichen ;
Song, Yuhang ;
Yang, Li .
2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021, 2021, :9185-9194
[5]  
Devlin J, 2019, Arxiv, DOI [arXiv:1810.04805, 10.48550/arXiv.1810.04805]
[6]   It depends on how you look at it: Scanpath comparison in multiple dimensions with MultiMatch, a vector-based approach [J].
Dewhurst, Richard ;
Nystrom, Marcus ;
Jarodzka, Halszka ;
Foulsham, Tom ;
Johansson, Roger ;
Holmqvist, Kenneth .
BEHAVIOR RESEARCH METHODS, 2012, 44 (04) :1079-1100
[7]  
Dosovitskiy A., 2020, International conference on learning representations
[8]   Perceptual Quality Assessment of Omnidirectional Images [J].
Duan, Huiyu ;
Zhai, Guangtao ;
Min, Xiongkuo ;
Zhu, Yucheng ;
Fang, Yi ;
Yang, Xiaokang .
2018 IEEE INTERNATIONAL SYMPOSIUM ON CIRCUITS AND SYSTEMS (ISCAS), 2018,
[9]   A Viewport-adaptive Rate Control Approach for Omnidirectional Video Coding [J].
Guo, Yichen ;
Xu, Mai ;
Yang, Li ;
Ding, Rui .
2021 DATA COMPRESSION CONFERENCE (DCC 2021), 2021, :342-342
[10]   MetaIQA: Deep Meta-learning for No-Reference Image Quality Assessment [J].
Zhu, Hancheng ;
Li, Leida ;
Wu, Jinjian ;
Dong, Weisheng ;
Shi, Guangming .
2020 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2020), 2020, :14131-14140