Multi-Modal Learning-Based Blind Video Quality Assessment Metric for Synthesized Views

被引:2
作者
Jin, Chongchong [1 ,2 ]
Peng, Zongju [1 ,2 ]
Chen, Fen [1 ]
Jiang, Gangyi [2 ]
Yu, Mei [2 ]
机构
[1] Chongqing Univ Technol, Sch Elect & Elect Engn, Chongqing 400054, Peoples R China
[2] Ningbo Univ, Fac Informat Sci & Engn, Ningbo 315211, Peoples R China
基金
中国国家自然科学基金;
关键词
Measurement; Distortion; Feature extraction; Quality assessment; Video recording; Visualization; Convolutional neural networks; Synthesized video quality assessment; no-reference; multi-model learning; sparse dictionary; IMAGE;
D O I
10.1109/TBC.2023.3284411
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
The quality attenuation of synthesized video will directly affect the widespread adoption of immersive video, so it is crucial to design a quality assessment model that can determine whether the synthesized video meets the requirements of commercial broadcasting. However, designing a general-purpose no-reference quality assessment metric for synthesized videos is difficult due to the imperfect view synthesizing technology and scene diversity. Currently, the existed quality assessment algorithms for synthesized views are mostly based on handcrafted feature extraction. Inspired by the theory that the input stimuli are hierarchically and sparsely processed in the cerebral cortex, we combine Convolutional Neural Network (CNN) learning and sparse dictionary learning mechanisms, and propose a Multi-Model Learning based Blind Synthesized Video Quality Assessment (MML-BSVQA) metric. Firstly, to better reflect the spatio-temporal distortions, we convert the synthesized video into the Spatial Domain (SD), Vertical Temporal Domain (VTD) and Horizontal Temporal Domain (HTD) using video decomposition operation plus optical flow estimation. Secondly, we extract the deep semantic features from three domains based on a pre-trained CNN model. Thirdly, we represent the sparse features of three domains using respective trained over-complete sparse dictionaries. Note that both the CNN model and sparse dictionaries are trained on natural videos to ensure the general-purpose of the proposed MML-BSVQA metric. Finally, the score of a synthesized video is generated by weighted regression. Experimental results on three synthesized video databases demonstrate that the proposed metric outperforms classic and state-of-the-art quality assessment metrics.
引用
收藏
页码:208 / 222
页数:15
相关论文
共 49 条
  • [1] SpEED-QA: Spatial Efficient Entropic Differencing for Image and Video Quality
    Bampis, Christos G.
    Gupta, Praful
    Soundararajan, Rajiv
    Bovik, Alan C.
    [J]. IEEE SIGNAL PROCESSING LETTERS, 2017, 24 (09) : 1333 - 1337
  • [2] Perceived quality of DIBR-based synthesized views
    Bosc, Emilie
    Pepion, Romuald
    Le Callet, Patrick
    Koeppel, Martin
    Ndjiki-Nya, Patrick
    Morin, Luce
    Pressigout, Muriel
    [J]. APPLICATIONS OF DIGITAL IMAGE PROCESSING XXXIV, 2011, 8135
  • [3] Towards a New Quality Metric for 3-D Synthesized View Assessment
    Bosc, Emilie
    Pepion, Romuald
    Le Callet, Patrick
    Koeppel, Martin
    Ndjiki-Nya, Patrick
    Pressigout, Muriel
    Morin, Luce
    [J]. IEEE JOURNAL OF SELECTED TOPICS IN SIGNAL PROCESSING, 2011, 5 (07) : 1332 - 1343
  • [4] Deep Light Field Super-Resolution Using Frequency Domain Analysis and Semantic Prior
    Chen, Yeyao
    Jiang, Gangyi
    Jiang, Zhidi
    Yu, Mei
    Ho, Yo-Sung
    [J]. IEEE TRANSACTIONS ON MULTIMEDIA, 2022, 24 : 3722 - 3737
  • [5] Deng J, 2009, PROC CVPR IEEE, P248, DOI 10.1109/CVPRW.2009.5206848
  • [6] Ultra-Low Latency, Stable, and Scalable Video Transmission for Free-Viewpoint Video Services
    Dong, Yu
    Song, Li
    Xie, Rong
    Zhang, Wenjun
    [J]. IEEE TRANSACTIONS ON BROADCASTING, 2022, 68 (03) : 636 - 650
  • [7] Toward Hyper-Realistic and Interactive Social VR Experiences in Live TV Scenarios
    Fernandez Langa, Sergi
    Montagud Climent, Mario
    Cernigliaro, Gianluca
    Rincon Rivera, David
    [J]. IEEE TRANSACTIONS ON BROADCASTING, 2022, 68 (01) : 13 - 32
  • [8] Orientation selectivity of thalamic input to simple cells of cat visual cortex
    Ferster, D
    Chung, S
    Wheat, H
    [J]. NATURE, 1996, 380 (6571) : 249 - 252
  • [9] Multiscale Natural Scene Statistical Analysis for No-Reference Quality Evaluation of DIBR-Synthesized Views
    Gu, Ke
    Qiao, Junfei
    Lee, Sanghoon
    Liu, Hantao
    Lin, Weisi
    Le Callet, Patrick
    [J]. IEEE TRANSACTIONS ON BROADCASTING, 2020, 66 (01) : 127 - 139
  • [10] Model-Based Referenceless Quality Metric of 3D Synthesized Images Using Local Image Description
    Gu, Ke
    Jakhetiya, Vinit
    Qiao, Jun-Fei
    Li, Xiaoli
    Lin, Weisi
    Thalmann, Daniel
    [J]. IEEE TRANSACTIONS ON IMAGE PROCESSING, 2018, 27 (01) : 394 - 405