Toward Naturalistic 2D-to-3D Conversion

被引:17
作者
Huang, Weicheng [1 ]
Cao, Xun [2 ]
Lu, Ke [1 ]
Dai, Qionghai [3 ]
Bovik, Alan Conrad [4 ]
机构
[1] Univ Chinese Acad Sci, Coll Engn & Informat Technol, Beijing 100049, Peoples R China
[2] Nanjing Univ, Sch Elect Sci & Engn, Nanjing 210093, Jiangsu, Peoples R China
[3] Tsinghua Univ, Dept Automat, Beijing 100084, Peoples R China
[4] Univ Texas Austin, Dept Elect & Comp Engn, Austin, TX 78712 USA
基金
中国国家自然科学基金; 美国国家科学基金会;
关键词
2D-to-3D conversion; depth propagation; natural scene statistics; Bayesian inference; SPATIAL-FREQUENCY; VIDEO; IMAGE; PREDICTION; COLOR;
D O I
10.1109/TIP.2014.2385474
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Natural scene statistics (NSSs) models have been developed that make it possible to impose useful perceptually relevant priors on the luminance, colors, and depth maps of natural scenes. We show that these models can be used to develop 3D content creation algorithms that can convert monocular 2D videos into statistically natural 3D-viewable videos. First, accurate depth information on key frames is obtained via human annotation. Then, both forward and backward motion vectors are estimated and compared to decide the initial depth values, and a compensation process is applied to further improve the depth initialization. Then, the luminance/chrominance and initial depth map are decomposed by a Gabor filter bank. Each subband of depth is modeled to produce a NSS prior term. The statistical color-depth priors are combined with the spatial smoothness constraint in the depth propagation target function as a prior regularizing term. The final depth map associated with each frame of the input 2D video is optimized by minimizing the target function over all subbands. In the end, stereoscopic frames are rendered from the color frames and their associated depth maps. We evaluated the quality of the generated 3D videos using both subjective and objective quality assessment methods. The experimental results obtained on various sequences show that the presented method outperforms several state-of-the-art 2D-to-3D conversion methods.
引用
收藏
页码:724 / 733
页数:10
相关论文
共 35 条
[1]  
[Anonymous], P SPIE
[2]  
Barnard S. T., 1986, Proceedings AAAI-86: Fifth National Conference on Artificial Intelligence, P676
[3]   Comparison of video shot boundary detection techniques [J].
Boreczky, JS ;
Rowe, LA .
JOURNAL OF ELECTRONIC IMAGING, 1996, 5 (02) :122-128
[4]   MULTICHANNEL TEXTURE ANALYSIS USING LOCALIZED SPATIAL FILTERS [J].
BOVIK, AC ;
CLARK, M ;
GEISLER, WS .
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 1990, 12 (01) :55-73
[5]   Automatic Prediction of Perceptual Image and Video Quality [J].
Bovik, Alan Conrad .
PROCEEDINGS OF THE IEEE, 2013, 101 (09) :2008-2024
[6]  
Boykov Y., 1999, Proceedings of the Seventh IEEE International Conference on Computer Vision, P377, DOI 10.1109/ICCV.1999.791245
[7]   Image compression via joint statistical characterization in the wavelet domain [J].
Buccigrossi, RW ;
Simoncelli, EP .
IEEE TRANSACTIONS ON IMAGE PROCESSING, 1999, 8 (12) :1688-1701
[8]   Converting 2D Video to 3D: An Efficient Path to a 3D Experience [J].
Cao, Xun ;
Bovik, Alan C. ;
Wang, Yao ;
Dai, Qionghai .
IEEE MULTIMEDIA, 2011, 18 (04) :12-17
[9]   Semi-Automatic 2D-to-3D Conversion Using Disparity Propagation [J].
Cao, Xun ;
Li, Zheng ;
Dai, Qionghai .
IEEE TRANSACTIONS ON BROADCASTING, 2011, 57 (02) :491-499
[10]  
Chenglei Wu, 2008, 2008 3DTV-Conference: The True Vision - Capture, Transmission and Display of 3D Video, P65