CASIA-onDo: A New Database for Online Handwritten Document Analysis

被引:0
作者
Yang, Yu-Ting [1 ,2 ]
Zhang, Yan-Ming [1 ]
Yun, Xiao-Long [1 ,2 ]
Yin, Fei [1 ]
Liu, Cheng-Lin [1 ,2 ]
机构
[1] Chinese Acad Sci, Inst Automat, Natl Lab Pattern Recognit, Beijing 100190, Peoples R China
[2] Univ Chinese Acad Sci, Sch Artificial Intelligence, Beijing 100049, Peoples R China
来源
PATTERN RECOGNITION, ACPR 2021, PT II | 2022年 / 13189卷
基金
中国国家自然科学基金;
关键词
Online handwritten document; Document layout analysis; Stroke classification; Database; STROKE CLASSIFICATION; RECOGNITION; TEXT;
D O I
10.1007/978-3-031-02444-3_13
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In this paper we introduce an online handwritten document database (CASIA-onDo), serving as a standard database for the development and evaluation of methods in the field of online handwritten document layout analysis. It consists of 2,012 documents including a total of 841,159 online strokes. The database, covering Chinese and English languages, was produced by 200 writers. Six types of contents occur in the documents, namely text, formulas, diagrams, tables, figures, and lists. The distribution of different types is close to the actual situation. Benefiting from detailed annotations, CASIA-onDo can support different tasks of layout analysis under online or offline settings. Firstly, based on the semantic level annotation, it can be used for many classification tasks such as text/non-text classification, table/non-table classification, multi-class stroke classification and so on. Secondly, based on the instance level annotation, it can be used for segmentation tasks such as text line separation and formula segmentation. Thirdly, based on the various writing styles, it can be used for handwriting recognition and writer clustering tasks. In addition, we perform preliminary experiments to provide a benchmark on this database with a state-of-the-art method. More techniques can be evaluated on this challenging database in the future.
引用
收藏
页码:174 / 188
页数:15
相关论文
共 23 条
[1]   First Experiments on a new Online Handwritten Flowchart Database [J].
Awal, Ahmad-Montaser ;
Feng, Guihuan ;
Mouchere, Harold ;
Viard-Gaudin, Christian .
DOCUMENT RECOGNITION AND RETRIEVAL XVIII, 2011, 7874
[2]   Recognition System for On-line Sketched Diagrams [J].
Bresler, Martin ;
Truyen Van Phan ;
Prusa, Daniel ;
Nakagawa, Masaki ;
Hlavac, Vaclav .
2014 14TH INTERNATIONAL CONFERENCE ON FRONTIERS IN HANDWRITING RECOGNITION (ICFHR), 2014, :563-568
[3]   Online recognition of sketched arrow-connected diagrams [J].
Bresler, Martin ;
Prusa, Daniel ;
Hlavac, Vaclav .
INTERNATIONAL JOURNAL ON DOCUMENT ANALYSIS AND RECOGNITION, 2016, 19 (03) :253-267
[4]   Contextual text/non-text stroke classification in online handwritten notes with conditional random fields [J].
Delaye, Adrien ;
Liu, Cheng-Lin .
PATTERN RECOGNITION, 2014, 47 (03) :959-968
[5]  
GUYON I, 1994, INT C PATT RECOG, P29, DOI 10.1109/ICPR.1994.576870
[6]  
Indermuhle Emanuel., 2010, Pro- ceedings of the 9th IAPR International Workshop on Document Analysis Sys- tems, P97, DOI [10.1145/1815330.1815343, DOI 10.1145/1815330.1815343]
[7]   Mode Detection in Online Handwritten Documents Using BLSTM Neural Networks [J].
Indermuehle, Emanuel ;
Frinken, Volkmar ;
Bunke, Horst .
13TH INTERNATIONAL CONFERENCE ON FRONTIERS IN HANDWRITING RECOGNITION (ICFHR 2012), 2012, :302-307
[8]   SCUT-COUCH2009-a comprehensive online unconstrained Chinese handwriting database and benchmark evaluation [J].
Jin, Lianwen ;
Gao, Yan ;
Liu, Gang ;
Li, Yunyang ;
Ding, Kai .
INTERNATIONAL JOURNAL ON DOCUMENT ANALYSIS AND RECOGNITION, 2011, 14 (01) :53-64
[9]  
Jun-Yu Ye, 2019, 2019 International Conference on Document Analysis and Recognition (ICDAR). Proceedings, P993, DOI 10.1109/ICDAR.2019.00163
[10]   CASIA Online and Offline Chinese Handwriting Databases [J].
Liu, Cheng-Lin ;
Yin, Fei ;
Wang, Da-Han ;
Wang, Qiu-Feng .
11TH INTERNATIONAL CONFERENCE ON DOCUMENT ANALYSIS AND RECOGNITION (ICDAR 2011), 2011, :37-41