Visual Attention Modeling for Stereoscopic Video: A Benchmark and Computational Model

被引:53
作者
Fang, Yuming [1 ]
Zhang, Chi [1 ]
Li, Jing [2 ]
Lei, Jianjun [3 ]
Da Silva, Matthieu Perreira [2 ]
Le Callet, Patrick [2 ]
机构
[1] Jiangxi Univ Finance & Econ, Sch Informat Technol, Nanchang 330032, Jiangxi, Peoples R China
[2] Univ Nantes, Polytech Nantes, F-44306 Nantes, France
[3] Tianjin Univ, Sch Elect & Informat Engn, Tianjin 300072, Peoples R China
关键词
Visual attention; stereoscopic video; spatiotemporal; saliency detection; gestalt theory; SALIENCY DETECTION; QUALITY ASSESSMENT; COMPRESSED DOMAIN; 3D; IMAGE; MOTION;
D O I
10.1109/TIP.2017.2721112
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In this paper, we investigate the visual attention modeling for stereoscopic video from the following two aspects. First, we build one large-scale eye tracking database as the benchmark of visual attention modeling for stereoscopic video. The database includes 47 video sequences and their corresponding eye fixation data. Second, we propose a novel computational model of visual attention for stereoscopic video based on Gestalt theory. In the proposed model, we extract the low-level features, including luminance, color, texture, and depth, from discrete cosine transform coefficients, which are used to calculate feature contrast for the spatial saliency computation. The temporal saliency is calculated by the motion contrast from the planar and depth motion features in the stereoscopic video sequences. The final saliency is estimated by fusing the spatial and temporal saliency with uncertainty weighting, which is estimated by the laws of proximity, continuity, and common fate in Gestalt theory. Experimental results show that the proposed method outperforms the state-of-the-art stereoscopic video saliency detection models on our built large-scale eye tracking database and one other database (DML-ITRACK-3D).
引用
收藏
页码:4684 / 4696
页数:13
相关论文
共 54 条
[1]   Motion onset captures attention [J].
Abrams, RA ;
Christ, SE .
PSYCHOLOGICAL SCIENCE, 2003, 14 (05) :427-432
[2]  
Achanta R, 2009, PROC CVPR IEEE, P1597, DOI 10.1109/CVPRW.2009.5206596
[3]  
[Anonymous], 1988, Signal Detection Theory and Psychophysics
[4]  
Banerjee J., 1994, ENCYCLOPAEDIC DICT P, P107
[5]   State-of-the-Art in Visual Attention Modeling [J].
Borji, Ali ;
Itti, Laurent .
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2013, 35 (01) :185-207
[6]   Converting 2D Video to 3D: An Efficient Path to a 3D Experience [J].
Cao, Xun ;
Bovik, Alan C. ;
Wang, Yao ;
Dai, Qionghai .
IEEE MULTIMEDIA, 2011, 18 (04) :12-17
[7]   SPATIO-TEMPORAL COMBINATION OF SALIENCY MAPS AND EYE-TRACKING ASSESSMENT OF DIFFERENT STRATEGIES [J].
Chamaret, C. ;
Chevet, J. C. ;
Le Meur, O. .
2010 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING, 2010, :1077-1080
[8]   Adaptive 3D Rendering based on Region-of-Interest [J].
Chamaret, Christel ;
Godeffroy, Sylvain ;
Lopez, Patrick ;
Le Meur, Olivier .
STEREOSCOPIC DISPLAYS AND APPLICATIONS XXI, 2010, 7524
[9]   A First-Order Primal-Dual Algorithm for Convex Problems with Applications to Imaging [J].
Chambolle, Antonin ;
Pock, Thomas .
JOURNAL OF MATHEMATICAL IMAGING AND VISION, 2011, 40 (01) :120-145
[10]  
Cheng E, 2012, INT WORK QUAL MULTIM, P212, DOI 10.1109/QoMEX.2012.6263873