Study of Spatio-Temporal Modeling in Video Quality Assessment

被引:7
|
作者
Fang, Yuming [1 ]
Li, Zhaoqian [1 ]
Yan, Jiebin [1 ]
Sui, Xiangjie [1 ]
Liu, Hantao [2 ]
机构
[1] Jiangxi Univ Finance & Econ, Sch Informat Technol, Nanchang 330032, Jiangxi, Peoples R China
[2] Cardiff Univ, Sch Comp Sci & Informat, Cardiff CF24 3AA, Wales
基金
中国博士后科学基金; 中国国家自然科学基金;
关键词
Video quality assessment; spatio-temporal modeling; recurrent neural network; PREDICTION; DATABASE; FLOW;
D O I
10.1109/TIP.2023.3272480
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Video quality assessment (VQA) has received remarkable attention recently. Most of the popular VQA models employ recurrent neural networks (RNNs) to capture the temporal quality variation of videos. However, each long-term video sequence is commonly labeled with a single quality score, with which RNNs might not be able to learn long-term quality variation well: What's the real role of RNNs in learning the visual quality of videos? Does it learn spatio-temporal representation as expected or just aggregating spatial features redundantly? In this study, we conduct a comprehensive study by training a family of VQA models with carefully designed frame sampling strategies and spatio-temporal fusion methods. Our extensive experiments on four publicly available in- the-wild video quality datasets lead to two main findings. First, the plausible spatio-temporal modeling module (i. e., RNNs) does not facilitate quality-aware spatio-temporal feature learning. Second, sparsely sampled video frames are capable of obtaining the competitive performance against using all video frames as the input. In other words, spatial features play a vital role in capturing video quality variation for VQA. To our best knowledge, this is the first work to explore the issue of spatio-temporal modeling in VQA.
引用
收藏
页码:2693 / 2702
页数:10
相关论文
共 50 条
  • [41] Spatio-temporal querying in video databases
    Köprülü, M
    Çiçekli, NK
    Yazici, A
    FLEXIBLE QUERY ANSWERING SYSTEMS, PROCEEDINGS, 2002, 2522 : 251 - 262
  • [42] Kronecker PCA Based Spatio-Temporal Modeling of Video for Dismount Classification
    Greenewald, Kristjan H.
    Hero, Alfred O., III
    ALGORITHMS FOR SYNTHETIC APERTURE RADAR IMAGERY XXI, 2014, 9093
  • [43] Video modeling via spatio-temporal adaptive localized learning (STALL)
    Zheng, Yunfei
    Li, Xin
    2006 FORTIETH ASILOMAR CONFERENCE ON SIGNALS, SYSTEMS AND COMPUTERS, VOLS 1-5, 2006, : 979 - +
  • [44] Deep Learning Based Video Spatio-Temporal Modeling for Emotion Recognition
    Fonnegra, Ruben D.
    Diaz, Gloria M.
    HUMAN-COMPUTER INTERACTION: THEORIES, METHODS, AND HUMAN ISSUES, HCI INTERNATIONAL 2018, PT I, 2018, 10901 : 397 - 408
  • [45] A spatio-temporal representation scheme for modeling moving objects in video data
    Shim, CB
    Chang, JW
    ADVANCES IN COMPUTING SCIENCE-ASIAN 2000, PROCEEDINGS, 2000, 1961 : 104 - 118
  • [46] Spatio-temporal querying in video databases
    Koprulu, M
    Cicekli, NK
    Yazici, A
    INFORMATION SCIENCES, 2004, 160 (1-4) : 131 - 152
  • [47] Spatio-Temporal Context Modeling for BoW-Based Video Classification
    Yi, Saehoon
    Pavlovic, Vladimir
    2013 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION WORKSHOPS (ICCVW), 2013, : 779 - 786
  • [48] Point Spatio-Temporal Transformer Networks for Point Cloud Video Modeling
    Fan, Hehe
    Yang, Yi
    Kankanhalli, Mohan
    IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2023, 45 (02) : 2181 - 2192
  • [49] VIDEO ACTION RECOGNITION WITH SPATIO-TEMPORAL GRAPH EMBEDDING AND SPLINE MODELING
    Yuan, Yin
    Zheng, Haomian
    Li, Zhu
    Zhang, David
    2010 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2010, : 2422 - 2425
  • [50] Novel Spatio-Temporal Structural Information Based Video Quality Metric
    Wang, Yue
    Jiang, Tingting
    Ma, Siwei
    Gao, Wen
    IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2012, 22 (07) : 989 - 998