Frame selection for OCR from video stream of book flipping

被引：0

作者：

Dibyayan Chakraborty

Partha Pratim Roy

Rajkumar Saini

Jose M. Alvarez

Umapada Pal

机构：

[1] ISI Kolkata,Computer Vision and Pattern Recognition Unit

[2] IIT Roorkee,Department of Computer Science and Engineering

[3] Canberra Research Lab,undefined

[4] ACT,undefined

来源：

Multimedia Tools and Applications | 2018年 / 77卷

关键词：

Video OCR; OCR of flipping book; Video document image;

D O I：

暂无

中图分类号：

学科分类号：

摘要：

Optical Character Recognition (OCR) in video stream of flipping pages is a challenging task because flipping at random speed causes difficulties in identifying the frames that contain the open page image (OPI). Also, low resolution, blurring effect, shadow, etc., add significant noise in selection of proper frames for OCR. In this paper, we focus on identifying a set of representative frames from the video stream of flipping pages without using any explicit hardware and then perform OCR on these frames for recognition. Thus, an end-to-end solution is proposed for video stream of flipping pages. To select an OPI, we present an efficient algorithm that exploits cues from edge information during flipping event. These cues, extracted from the region of interest (ROI) of the frame, determine the flipping or open state of a page. The open state classification is performed by an SVM classifier following training of the edge cue information. After selecting a set of frames for each OPI, a representative frame from OPI set is chosen for OCR. Experiments are performed on videos captured using standard resolution camera. We have obtained 88.81 % accuracy on representative frame selection from the proposed method whereas when compared with GIST (Oliva and Torralba, Int J Comput Vis 42(3):145–175 (2001)), the accuracy was only 51.28 %. To the best of our knowledge this is the first work in this area. After frame selection, we have achieved 83.31 % character recognition accuracy and 78.11 % word recognition accuracy with traditional OCR in our dataset of flipping book.

引用

页码：985 / 1008

页数：23

共 29 条

[1]

Canny J(1986)A computational approach to edge detection IEEE Trans Pattern Anal Mach Intell 8 679-698

[2]

Lee CW(2003)Automatic text detection and removal in video sequences Pattern Recogn Lett 24 2607-2623

[3]

Jung K(2002)Localizing and segmenting text in images and videos IEEE Trans Circuits Syst Video Technol 12 256-268

[4]

Kim HJ(2005)Video text detection and segmentation for optical character recognition Multimedia Systems 10 261-272

[5]

Lienhart R(2002)Multiresolution gray-scale and rotation invariant texture classification with Local Binary Patterns IEEE Trans Pattern Anal Mach Intell 24 971-987

[6]

Wernicke A(2001)Modeling the shape of the scene: a holistic representation of spatial envelope Int J Comput Vis 42 145-175

[7]

Ngo CW(1979)A threshold selection method from gray-level histograms IEEE Trans Syst Man Cybern 9 62-66

[8]

Chan CK(2000)Adaptive document image binarization Pattern Recogn 33 225-236

[9]

Ojala T(2013)Robust document image binarization technique for degraded document images IEEE Trans Image Processing 22 1408-1417

[10]

Pietikäinen M(2011)Text string detection from natural scenes by structure-based partition and grouping IEEE Trans Image Process 20 2594-2605

← 1 2 3 →