Study of Spatio-Temporal Modeling in Video Quality Assessment

被引：7

作者：

Fang, Yuming ^{[1
]}

Li, Zhaoqian ^{[1
]}

Yan, Jiebin ^{[1
]}

Sui, Xiangjie ^{[1
]}

Liu, Hantao ^{[2
]}

机构：

[1] Jiangxi Univ Finance & Econ, Sch Informat Technol, Nanchang 330032, Jiangxi, Peoples R China

[2] Cardiff Univ, Sch Comp Sci & Informat, Cardiff CF24 3AA, Wales

来源：

IEEE TRANSACTIONS ON IMAGE PROCESSING | 2023年 / 32卷

基金：

中国博士后科学基金; 中国国家自然科学基金;

关键词：

Video quality assessment; spatio-temporal modeling; recurrent neural network; PREDICTION; DATABASE; FLOW;

D O I：

10.1109/TIP.2023.3272480

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Video quality assessment (VQA) has received remarkable attention recently. Most of the popular VQA models employ recurrent neural networks (RNNs) to capture the temporal quality variation of videos. However, each long-term video sequence is commonly labeled with a single quality score, with which RNNs might not be able to learn long-term quality variation well: What's the real role of RNNs in learning the visual quality of videos? Does it learn spatio-temporal representation as expected or just aggregating spatial features redundantly? In this study, we conduct a comprehensive study by training a family of VQA models with carefully designed frame sampling strategies and spatio-temporal fusion methods. Our extensive experiments on four publicly available in- the-wild video quality datasets lead to two main findings. First, the plausible spatio-temporal modeling module (i. e., RNNs) does not facilitate quality-aware spatio-temporal feature learning. Second, sparsely sampled video frames are capable of obtaining the competitive performance against using all video frames as the input. In other words, spatial features play a vital role in capturing video quality variation for VQA. To our best knowledge, this is the first work to explore the issue of spatio-temporal modeling in VQA.

引用

页码：2693 / 2702

页数：10

共 50 条

[41] Spatio-temporal querying in video databases
Köprülü, M
Çiçekli, NK
Yazici, A
FLEXIBLE QUERY ANSWERING SYSTEMS, PROCEEDINGS, 2002, 2522 : 251 - 262
[42] Kronecker PCA Based Spatio-Temporal Modeling of Video for Dismount Classification
Greenewald, Kristjan H.
Hero, Alfred O., III
ALGORITHMS FOR SYNTHETIC APERTURE RADAR IMAGERY XXI, 2014, 9093
[43] Video modeling via spatio-temporal adaptive localized learning (STALL)
Zheng, Yunfei
Li, Xin
2006 FORTIETH ASILOMAR CONFERENCE ON SIGNALS, SYSTEMS AND COMPUTERS, VOLS 1-5, 2006, : 979 - +
[44] Deep Learning Based Video Spatio-Temporal Modeling for Emotion Recognition
Fonnegra, Ruben D.
Diaz, Gloria M.
HUMAN-COMPUTER INTERACTION: THEORIES, METHODS, AND HUMAN ISSUES, HCI INTERNATIONAL 2018, PT I, 2018, 10901 : 397 - 408
[45] A spatio-temporal representation scheme for modeling moving objects in video data
Shim, CB
Chang, JW
ADVANCES IN COMPUTING SCIENCE-ASIAN 2000, PROCEEDINGS, 2000, 1961 : 104 - 118
[46] Spatio-temporal querying in video databases
Koprulu, M
Cicekli, NK
Yazici, A
INFORMATION SCIENCES, 2004, 160 (1-4) : 131 - 152
[47] Spatio-Temporal Context Modeling for BoW-Based Video Classification
Yi, Saehoon
Pavlovic, Vladimir
2013 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION WORKSHOPS (ICCVW), 2013, : 779 - 786
[48] Point Spatio-Temporal Transformer Networks for Point Cloud Video Modeling
Fan, Hehe
Yang, Yi
Kankanhalli, Mohan
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2023, 45 (02) : 2181 - 2192
[49] VIDEO ACTION RECOGNITION WITH SPATIO-TEMPORAL GRAPH EMBEDDING AND SPLINE MODELING
Yuan, Yin
Zheng, Haomian
Li, Zhu
Zhang, David
2010 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2010, : 2422 - 2425
[50] Novel Spatio-Temporal Structural Information Based Video Quality Metric
Wang, Yue
Jiang, Tingting
Ma, Siwei
Gao, Wen
IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2012, 22 (07) : 989 - 998

← 1 2 3 4 5 →