DewarpNet: Single-Image Document UnwarpingWith Stacked 3D and 2D Regression Networks

被引:50
作者
Das, Sagnik [1 ]
Ma, Ke [1 ]
Shu, Zhixin [1 ]
Samaras, Dimitris [1 ]
Shilkrot, Roy [1 ]
机构
[1] SUNY Stony Brook, Stony Brook, NY 11794 USA
来源
2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2019) | 2019年
关键词
SHAPE; ORIENTATION;
D O I
10.1109/ICCV.2019.00022
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Capturing document images with hand-held devices in unstructured environments is a common practice nowadays. However, "casual" photos of documents are usually unsuitable for automatic information extraction, mainly due to physical distortion of the document paper, as well as various camera positions and illumination conditions. In this work, we propose DewarpNet, a deep-learning approach for document image unwarping from a single image. Our insight is that the 3D geometry of the document not only determines the warping of its texture but also causes the illumination effects. Therefore, our novelty resides on the explicit modeling of 3D shape for document paper in an end-to-end pipeline. Also, we contribute the largest and most comprehensive dataset for document image unwarping to date - Doc3D. This dataset features multiple ground-truth annotations, including 3D shape, surface normals, UV map, albedo image, etc. Training with Doc3D, we demonstrate state-of-the-art performance for DewarpNet with extensive qualitative and quantitative evaluations. Our network also significantly improves OCR performance on captured document images, decreasing character error rate by 42% on average. Both the code and the dataset are released.
引用
收藏
页码:131 / 140
页数:10
相关论文
共 46 条
[1]  
[Anonymous], 1966, Soviet Physics Doklady
[2]  
Barrow Harry, 1978, Comput. vis. syst, V2, P2
[3]   The ball-pivoting algorithm for surface reconstruction [J].
Bernardini, F ;
Mittleman, J ;
Rushmeier, H ;
Silva, C ;
Taubin, G .
IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS, 1999, 5 (04) :349-359
[4]  
Brown Michael S, 2001, P ICCV
[5]   A cylindrical surface model to rectify the bound document image [J].
Cao, HG ;
Ding, XQ ;
Liu, CS .
NINTH IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION, VOLS I AND II, PROCEEDINGS, 2003, :228-233
[6]   Class-specific material categorisation [J].
Caputo, B ;
Hayman, E ;
Mallikarjuna, P .
TENTH IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION, VOLS 1 AND 2, PROCEEDINGS, 2005, :1597-1604
[7]   Describing Textures in the Wild [J].
Cimpoi, Mircea ;
Maji, Subhransu ;
Kokkinos, Iasonas ;
Mohamed, Sammy ;
Vedaldi, Andrea .
2014 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2014, :3606-3613
[8]   Shape from shading for the digitization of curved documents [J].
Courteille, Frederic ;
Crouzil, Alain ;
Durou, Jean-Denis ;
Gurdjos, Pierre .
MACHINE VISION AND APPLICATIONS, 2007, 18 (05) :301-316
[9]   The Common Fold: Utilizing the Four-Fold to Dewarp Printed Documents from a Single Image [J].
Das, Sagnik ;
Mishra, Gaurav ;
Sudharshana, Akshay ;
Shilkrot, Roy .
PROCEEDINGS OF THE 2017 ACM SYMPOSIUM ON DOCUMENT ENGINEERING (DOCENG 17), 2017, :125-128
[10]  
Ezaki Hironori, 2005, P ICDAR