Learning Generalized Spatial-Temporal Deep Feature Representation for No-Reference Video Quality Assessment

被引：45

作者：

Chen, Baoliang ^{[1
]}

Zhu, Lingyu ^{[1
]}

Li, Guo ^{[2
]}

Lu, Fangbo ^{[2
]}

Fan, Hongfei ^{[2
]}

Wang, Shiqi ^{[1
]}

机构：

[1] City Univ Hong Kong, Dept Comp Sci, Hong Kong, Peoples R China

[2] Kingsoft Cloud, Beijing 100000, Peoples R China

来源：

IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY | 2022年 / 32卷 / 04期

基金：

中国国家自然科学基金;

关键词：

Feature extraction; Quality assessment; Training; Video recording; Image quality; Streaming media; Nonlinear distortion; Video quality assessment; generalization capability; deep neural networks; temporal aggregation; IMAGE; STATISTICS; DATABASE;

D O I：

10.1109/TCSVT.2021.3088505

中图分类号：

TM [电工技术]; TN [电子技术、通信技术];

学科分类号：

0808 ; 0809 ;

摘要：

In this work, we propose a no-reference video quality assessment method, aiming to achieve high-generalization capability in cross-content, -resolution and -frame rate quality prediction. In particular, we evaluate the quality of a video by learning effective feature representations in spatial-temporal domain. In the spatial domain, to tackle the resolution and content variations, we impose the Gaussian distribution constraints on the quality features. The unified distribution can significantly reduce the domain gap between different video samples, resulting in more generalized quality feature representation. Along the temporal dimension, inspired by the mechanism of visual perception, we propose a pyramid temporal aggregation module by involving the short-term and long-term memory to aggregate the frame-level quality. Experiments show that our method outperforms the state-of-the-art methods on cross-dataset settings, and achieves comparable performance on intra-dataset configurations, demonstrating the high-generalization capability of the proposed method. The codes are released at https://github.com/Baoliang93/GSTVQA

引用

页码：1903 / 1916

页数：14

共 70 条

[1]

Andrienko G., 2013, Introduction, P1

[2] Study of Temporal Effects on Subjective Video Quality of Experience [J].

Bampis, Christos George ;

Li, Zhi ;

Moorthy, Anush Krishna ;

Katsavounidis, Ioannis ;

Aaron, Anne ;

Bovik, Alan Conrad .

IEEE TRANSACTIONS ON IMAGE PROCESSING, 2017, 26 (11) :5217-5231

[3] Deep Neural Networks for No-Reference and Full-Reference Image Quality Assessment [J].

Bosse, Sebastian ;

Maniry, Dominique ;

Mueller, Klaus-Robert ;

Wiegand, Thomas ;

Samek, Wojciech .

IEEE TRANSACTIONS ON IMAGE PROCESSING, 2018, 27 (01) :206-219

[4]

Camps Saskia, 2018, P 1 INT C MED IM DEE, P1

[5] Domain Generalization by Solving Jigsaw Puzzles [J].

Carlucci, Fabio M. ;

D'Innocente, Antonio ;

Bucci, Silvia ;

Caputo, Barbara ;

Tommasi, Tatiana .

2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019), 2019, :2224-2233

[6] RIRNet: Recurrent-In-Recurrent Network for Video Quality Assessment [J].

Chen, Pengfei ;

Li, Leida ;

Ma, Lei ;

Wu, Jinjian ;

Shi, Guangming .

MM '20: PROCEEDINGS OF THE 28TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, 2020, :834-842

[7] Asymmetric Foveated Just-Noticeable-Difference Model for Images With Visual Field Inhomogeneities [J].

Chen, Zhenzhong ;

Wu, Wei .

IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2020, 30 (11) :4064-4074

[8] Subjective and Objective Quality Assessment of Compressed 4K UHD Videos for Immersive Experience [J].

Cheon, Manri ;

Lee, Jong-Seok .

IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2018, 28 (07) :1467-1480

[9]

Cho K., 2014, P C EMP METH NAT LAN, P1724

[10]

Deng J, 2009, PROC CVPR IEEE, P248, DOI 10.1109/CVPRW.2009.5206848

← 1 2 3 4 5 6 7 →