Multi-Modal Learning-Based Blind Video Quality Assessment Metric for Synthesized Views

被引：2

作者：

Jin, Chongchong ^{[1
,2
]}

Peng, Zongju ^{[1
,2
]}

Chen, Fen ^{[1
]}

Jiang, Gangyi ^{[2
]}

Yu, Mei ^{[2
]}

机构：

[1] Chongqing Univ Technol, Sch Elect & Elect Engn, Chongqing 400054, Peoples R China

[2] Ningbo Univ, Fac Informat Sci & Engn, Ningbo 315211, Peoples R China

来源：

IEEE TRANSACTIONS ON BROADCASTING | 2024年 / 70卷 / 01期

基金：

中国国家自然科学基金;

关键词：

Measurement; Distortion; Feature extraction; Quality assessment; Video recording; Visualization; Convolutional neural networks; Synthesized video quality assessment; no-reference; multi-model learning; sparse dictionary; IMAGE;

D O I：

10.1109/TBC.2023.3284411

中图分类号：

TM [电工技术]; TN [电子技术、通信技术];

学科分类号：

0808 ; 0809 ;

摘要：

The quality attenuation of synthesized video will directly affect the widespread adoption of immersive video, so it is crucial to design a quality assessment model that can determine whether the synthesized video meets the requirements of commercial broadcasting. However, designing a general-purpose no-reference quality assessment metric for synthesized videos is difficult due to the imperfect view synthesizing technology and scene diversity. Currently, the existed quality assessment algorithms for synthesized views are mostly based on handcrafted feature extraction. Inspired by the theory that the input stimuli are hierarchically and sparsely processed in the cerebral cortex, we combine Convolutional Neural Network (CNN) learning and sparse dictionary learning mechanisms, and propose a Multi-Model Learning based Blind Synthesized Video Quality Assessment (MML-BSVQA) metric. Firstly, to better reflect the spatio-temporal distortions, we convert the synthesized video into the Spatial Domain (SD), Vertical Temporal Domain (VTD) and Horizontal Temporal Domain (HTD) using video decomposition operation plus optical flow estimation. Secondly, we extract the deep semantic features from three domains based on a pre-trained CNN model. Thirdly, we represent the sparse features of three domains using respective trained over-complete sparse dictionaries. Note that both the CNN model and sparse dictionaries are trained on natural videos to ensure the general-purpose of the proposed MML-BSVQA metric. Finally, the score of a synthesized video is generated by weighted regression. Experimental results on three synthesized video databases demonstrate that the proposed metric outperforms classic and state-of-the-art quality assessment metrics.

引用

页码：208 / 222

页数：15

共 49 条

[1] SpEED-QA: Spatial Efficient Entropic Differencing for Image and Video Quality
Bampis, Christos G.
Gupta, Praful
Soundararajan, Rajiv
Bovik, Alan C.
[J]. IEEE SIGNAL PROCESSING LETTERS, 2017, 24 (09) : 1333 - 1337
[2] Perceived quality of DIBR-based synthesized views
Bosc, Emilie
Pepion, Romuald
Le Callet, Patrick
Koeppel, Martin
Ndjiki-Nya, Patrick
Morin, Luce
Pressigout, Muriel
[J]. APPLICATIONS OF DIGITAL IMAGE PROCESSING XXXIV, 2011, 8135
[3] Towards a New Quality Metric for 3-D Synthesized View Assessment
Bosc, Emilie
Pepion, Romuald
Le Callet, Patrick
Koeppel, Martin
Ndjiki-Nya, Patrick
Pressigout, Muriel
Morin, Luce
[J]. IEEE JOURNAL OF SELECTED TOPICS IN SIGNAL PROCESSING, 2011, 5 (07) : 1332 - 1343
[4] Deep Light Field Super-Resolution Using Frequency Domain Analysis and Semantic Prior
Chen, Yeyao
Jiang, Gangyi
Jiang, Zhidi
Yu, Mei
Ho, Yo-Sung
[J]. IEEE TRANSACTIONS ON MULTIMEDIA, 2022, 24 : 3722 - 3737
[5] Deng J, 2009, PROC CVPR IEEE, P248, DOI 10.1109/CVPRW.2009.5206848
[6] Ultra-Low Latency, Stable, and Scalable Video Transmission for Free-Viewpoint Video Services
Dong, Yu
Song, Li
Xie, Rong
Zhang, Wenjun
[J]. IEEE TRANSACTIONS ON BROADCASTING, 2022, 68 (03) : 636 - 650
[7] Toward Hyper-Realistic and Interactive Social VR Experiences in Live TV Scenarios
Fernandez Langa, Sergi
Montagud Climent, Mario
Cernigliaro, Gianluca
Rincon Rivera, David
[J]. IEEE TRANSACTIONS ON BROADCASTING, 2022, 68 (01) : 13 - 32
[8] Orientation selectivity of thalamic input to simple cells of cat visual cortex
Ferster, D
Chung, S
Wheat, H
[J]. NATURE, 1996, 380 (6571) : 249 - 252
[9] Multiscale Natural Scene Statistical Analysis for No-Reference Quality Evaluation of DIBR-Synthesized Views
Gu, Ke
Qiao, Junfei
Lee, Sanghoon
Liu, Hantao
Lin, Weisi
Le Callet, Patrick
[J]. IEEE TRANSACTIONS ON BROADCASTING, 2020, 66 (01) : 127 - 139
[10] Model-Based Referenceless Quality Metric of 3D Synthesized Images Using Local Image Description
Gu, Ke
Jakhetiya, Vinit
Qiao, Jun-Fei
Li, Xiaoli
Lin, Weisi
Thalmann, Daniel
[J]. IEEE TRANSACTIONS ON IMAGE PROCESSING, 2018, 27 (01) : 394 - 405

← 1 2 3 4 5 →