Objective Quality Assessment of Stereoscopic Video Using Inflated 3D Features

被引:0
作者
Hassan Imani [1 ]
Md Baharul Islam [1 ]
机构
[1] Faculty of Engineering and Natural Sciences, Bahcesehir University, Yildiz Ciragan Cd, Istanbul, Besiktas
[2] Department of Computing and Software Engineering, Florida Gulf Coast University, Fort Myers, 33965, FL
关键词
3 Dimensional convolutional neural networks; Disparity; Human visual system; Motion; Objective quality assessment; Stereoscopic video;
D O I
10.1007/s42979-024-03184-7
中图分类号
学科分类号
摘要
Convolutional Neural Networks (CNNs) have been receiving research attention for Stereoscopic Video Quality Assessment (SVQA) in recent years. Recently, researchers have used 3D CNNs for extracting useful spatial and temporal features from stereo videos and have used them for detecting the reduction in the quality of the stereoscopic videos. To our best knowledge, the concept of transfer learning (TL) has not been well-examined in SVQA. Pretraining and fine-tuning are approaches used in deep neural networks to transform the knowledge learned from other general fields. The previous methods that utilized TL used very heavy 3D ResNet architectures with several layers; therefore, they are very time-consuming. In this paper, we develop a new model for SVQA and use the Inflated 3-Dimensional ConvNet (I3D) network as the backbone feature extractor for our model. We first apply left and right videos to I3D models to extract their features. Then, we apply 3D CNNs to learn quality-aware features from stereo videos. We evaluate our proposed method using LFOVIAS3DPh2 and NAMA3DS1- COSPAD1 SVQA datasets. Extensive experimental studies on two datasets prove that the proposed method correlates with the subjective results. The Root-Mean-Square Error (RMSE) for the NAMA3DS1-COSPAD1 dataset is 0.2454, and the high amount of Linear Correlation Coefficient (LCC) and Spearmen Rank Order Correlation Coefficient (SROCC) values (0.895 and 0.901 respectively) for LFOVIAS3DPh2 dataset show the compatibility of the results with human visual system (HVS). Despite having lighter architecture than the best performing method, the proposed method outperforms most of the methods and overall it is the second best performing method available. © The Author(s), under exclusive licence to Springer Nature Singapore Pte Ltd. 2024.
引用
收藏
相关论文
共 59 条
[1]  
Al-Najdawi A., Kalawsky R.S., Visual quality assessment of video and image sequences-a human-based approach, Journal of Signal Processing Systems, 59, 2, pp. 223-231, (2010)
[2]  
Torkamani-Azar F., Imani H., Fathollahian H., Video quality measurement based on 3-d. singular value decomposition, Journal of Visual Communication and Image Representation, 27, pp. 1-6, (2015)
[3]  
Yang J., Wang H., Lu W., Li B., Badii A., Meng Q., A no-reference optical flow-based quality evaluator for stereoscopic videos in curvelet domain, Inf Sci, 414, pp. 133-146, (2017)
[4]  
Prieto A., Prieto B., Ortigosa E.M., Ros E., Pelayo F., Ortega J., Rojas I., Neural networks: An overview of early research, current frameworks and new challenges, Neurocomputing, 214, pp. 242-268, (2016)
[5]  
Chen Y., Li W., Sakaridis C., Dai D., van Gool L., Domain adaptive faster r-cnn for object detection in the wild, In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3339-3348, (2018)
[6]  
Karpathy A., Toderici G., Shetty S., Leung T., Sukthankar R., Fei-Fei L., Large-scale video classification with convolutional neural networks, In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1725-1732, (2014)
[7]  
Appina B., Dendi S.V.R., Manasa K., Channappayya S.S., Bovik A.C., Study of subjective quality and objective blind quality prediction of stereoscopic videos, IEEE Trans Image Process, 28, 10, pp. 5027-5040, (2019)
[8]  
Urvoy M., Barkowsky M., Cousseau R., Koudota Y., Ricorde V., Le Callet P., Gutierrez J., Garcia N., Nama3ds1-cospad1: Subjective video quality assessment database on coding conditions introducing freely available high quality 3d stereoscopic sequences, In: Fourth International Workshop on Quality of Multimedia Experience, IEEE, 2012, pp. 109-114
[9]  
Feng Y., Yiyu C., No-reference image quality assessment through transfer learning, 2017 IEEE 2Nd International Conference on Signal and Image Processing (ICSIP), IEEE, pp. 90-94, (2017)
[10]  
Deng J., Dong W., Socher R., Li L.-J., Li K., Fei-Fei L., Imagenet: A Large-Scale Hierarchical Image Database, (2009)