An automatic histogram detection and information extraction from document images

被引:2
作者
Anagha, P. H. [1 ]
Baskar, A. [1 ]
机构
[1] Amrita Vishwa Vidyapeetham, Amrita Sch Engn, Dept Comp Sci & Engn, Coimbatore, Tamil Nadu, India
关键词
Histogram; Hough line detector; Morphological operator; Information; Extraction;
D O I
10.1007/s10772-020-09756-1
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
Histogram is an important data chart that is commonly present in scientific documents. In this paper, an automatic histogram detection and information extraction methodology, based on Hough line detector and Morphological operator, is proposed. The proffered system is comprised of three steps: pre-processing, axis detection, and chart pattern extraction. In the pre-processing step, the RGB image pattern of a histogram is converted into a binary image. Next, in the axis detection step, horizontal axis, vertical axis and title of the histogram are extracted. In this step Hough line detector methodology was applied to detect horizontal and vertical lines in the image patterns. From the set of identified vertical lines, both the endpoints of a line, having the same minimum values of x co-ordinate was considered as a vertical axis. Similarly, from the set of identified horizontal lines, the two endpoints of a line having the same maximum values of y co-ordinate were considered as a horizontal axis. With respect to the dimensions of the horizontal axis and vertical axis, a rectangular region containing horizontal axis values and label, vertical axis values and label and title are extracted. In the final chart pattern extraction step, using morphological operations, the frequency of data present in the histogram was identified. Verification and validation tests of the propounded system yielded promising results, indicative of efficient approach for extraction of histogram information.
引用
收藏
页码:77 / 85
页数:9
相关论文
共 14 条
[1]  
Al-Zaidy R. A., 2016, AAAI 2016 WORKSH SCH
[2]  
[Anonymous], 2015, P 8 INT C KNOWL CAPT
[3]  
Demir S., 2008, Proceedings of the Fifth International Natural Language Generation Conference, INLG '08, P7
[4]  
Dhanalakshmy DM, 2017, 2017 INTERNATIONAL CONFERENCE ON ADVANCES IN COMPUTING, COMMUNICATIONS AND INFORMATICS (ICACCI), P783, DOI 10.1109/ICACCI.2017.8125937
[5]   USE OF HOUGH TRANSFORMATION TO DETECT LINES AND CURVES IN PICTURES [J].
DUDA, RO ;
HART, PE .
COMMUNICATIONS OF THE ACM, 1972, 15 (01) :11-&
[6]   A model of perceptual task effort for bar charts and its role in recognizing intention [J].
Elzer, Stephanie ;
Green, Nancy ;
Carberry, Sandra ;
Hoffman, James .
USER MODELING AND USER-ADAPTED INTERACTION, 2006, 16 (01) :1-30
[7]   The automated understanding of simple bar charts [J].
Elzer, Stephanie ;
Carberry, Sandra ;
Zukerman, Ingrid .
ARTIFICIAL INTELLIGENCE, 2011, 175 (02) :526-555
[8]  
Fang J., 2012, AAAI
[9]   DVQA: Understanding Data Visualizations via Question Answering [J].
Kafle, Kushal ;
Price, Brian ;
Cohen, Scott ;
Kanan, Christopher .
2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2018, :5648-5656
[10]  
Kallimani JS, 2013, 2013 INTERNATIONAL CONFERENCE ON ADVANCES IN COMPUTING, COMMUNICATIONS AND INFORMATICS (ICACCI), P382, DOI 10.1109/ICACCI.2013.6637202