A Deep Learning-based Formula Detection Method for PDF Documents

被引：19

作者：

Gao, Liangcai ^{[1
]}

Yi, Xiaohan ^{[1
]}

Liao, Yuan ^{[1
]}

Jiang, Zhuoren ^{[1
]}

Yan, Zuoyu ^{[1
]}

Tang, Zhi ^{[1
]}

机构：

[1] Peking Univ, ICST, Beijing, Peoples R China

来源：

2017 14TH IAPR INTERNATIONAL CONFERENCE ON DOCUMENT ANALYSIS AND RECOGNITION (ICDAR), VOL 1 | 2017年

基金：

中国国家自然科学基金; 中国博士后科学基金;

关键词：

formula detection; deep learning; PDF documents;

D O I：

10.1109/ICDAR.2017.96

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

In practice, PDF files may be generated by different tools and their character information quality could be different. As a result, the approaches to detecting formulae from PDF documents usually have much different performance on different PDF files. To address this problem, in this paper we combine and refine the Convolutional Neural Network (CNN) and Recurrent Neural Network (RNN) model to detect formulae according to both their character and vision features. Based on the characteristic of PDF documents, we propose a series of strategies to train and optimize deep networks, such as the implicit class down-sampling strategy which can reduce the unbalancedness between formulae and other page elements (e.g., text paragraphs, tables, figures, etc.). The region proposal method is also redesigned to generate moderate formula candidates through combining the bottom-up and top-down layout analysis. The experimental results show that the combination of CNN and RNN can increase the robustness of our proposed detection method. Furthermore, the proposed method outperforms the existing formula detection methods on both a ground-truth dataset and a larger self-built dataset, which would be released and available for research purposes.

引用

页码：553 / 558

页数：6

共 23 条

[1]

Afzal Muhammad Zeshan, 2015, ICDAR 2015 13 INT C

[2]

[Anonymous], S INT SYST EXP APPL

[3]

Bahdanau D, 2016, Arxiv, DOI [arXiv:1409.0473, DOI 10.48550/ARXIV.1409.0473]

[4]

Chang Tzu-Yuan, 2007, ICDAR 2007 9 INT C I, V2

[5]

Chowdhury SP, 2003, PROC INT CONF DOC, P755

[6]

Deng Yuntian, 2016, What you get is what you see: A visual markup decompiler

[7] Optical character recognition and parsing of typeset mathematics [J].

Fateman, RJ ;

Tokuyasu, T ;

Berman, BP ;

Mitchell, N .

JOURNAL OF VISUAL COMMUNICATION AND IMAGE REPRESENTATION, 1996, 7 (01) :2-15

[8]

Garain Utpal, 2009, DOC AN REC 2009 ICDA

[9]

Garain Utpal, 2000, PATT REC 2000 P 15 I, V4

[10]

Ha Jaekyu, 1995, DOC AN REC 1995 P 3, V2

← 1 2 3 →