ICDAR2017 Competition on Post-OCR Text Correction

被引:26
作者
Chiron, Guillaume [1 ]
Doucet, Antoine [2 ]
Coustaty, Mickael [2 ]
Moreux, Jean-Philippe [1 ]
机构
[1] Natl Lib France, F-75706 Paris, France
[2] Univ La Rochelle, Lab L3i, Av Michel Crepeau, F-17000 La Rochelle, France
来源
2017 14TH IAPR INTERNATIONAL CONFERENCE ON DOCUMENT ANALYSIS AND RECOGNITION (ICDAR), VOL 1 | 2017年
关键词
D O I
10.1109/ICDAR.2017.232
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
This paper describes the ICDAR2017 competition on post-OCR text correction and presents the different methods submitted by the participants. OCR has been an active research field for over the past 30 years but results are still imperfect, especially for historical documents. The purpose of this competition is to compare and evaluate automatic approaches for correcting (denoising) OCR-ed texts. The challenge consists of two independent tasks: 1) error detection and 2) error correction. An original dataset of 12M OCR-ed symbols along with an aligned ground truth was provided to the participants with 80% of the dataset dedicated to the training and 20% to the evaluation. Different sources were aggregated and namely contain newspapers and monographs covering 2 languages (English and French). 11 teams submitted results, while the difficulty of the task was underlined by the fact that only half of the submitted methods were able to denoise the evaluation dataset on average. In any case, this competition, which counted 35 registrations, illustrates the strong interest of the community in this essential problem, which is key to any digitization process involving textual data.
引用
收藏
页码:1423 / 1428
页数:6
相关论文
共 50 条
[41]   An Efficient Unsupervised Approach for OCR Error Correction of Vietnamese OCR Text [J].
Nguyen, Quoc-Dung ;
Phan, Nguyet-Minh ;
Kromer, Pavel ;
Le, Duc-Anh .
IEEE ACCESS, 2023, 11 :58406-58421
[42]   OCR Error Correction for Vietnamese OCR Text with Different Edit Distances [J].
Quoc-Dung Nguyen ;
Nguyet-Minh Phan ;
Kromer, Pavel .
ADVANCES IN INTELLIGENT NETWORKING AND COLLABORATIVE SYSTEMS, INCOS-2022, 2022, 527 :130-139
[43]   ICDAR 2015 Competition HTRtS: Handwritten Text Recognition on the tranScriptorium Dataset [J].
Andreu Sanchez, Joan ;
Toselli, Alejandro H. ;
Romero, Veronica ;
Vidal, Enrique .
2015 13TH IAPR INTERNATIONAL CONFERENCE ON DOCUMENT ANALYSIS AND RECOGNITION (ICDAR), 2015, :1166-1170
[44]   ICDAR 2024 Competition on Historical Map Text Detection, Recognition, and Linking [J].
Li, Zekun ;
Lin, Yijun ;
Chiang, Yao-Yi ;
Weinman, Jerod ;
Tual, Solenn ;
Chazalon, Joseph ;
Perret, Julien ;
Dumenieu, Bertrand ;
Abadie, Nathalie .
DOCUMENT ANALYSIS AND RECOGNITION-ICDAR 2024, PT VI, 2024, 14809 :363-380
[45]   ICDAR 2021 Competition on Integrated Circuit Text Spotting and Aesthetic Assessment [J].
Ng, Chun Chet ;
Bin Nazaruddin, Akmalul Khairi ;
Lee, Yeong Khang ;
Wang, Xinyu ;
Liu, Yuliang ;
Chan, Chee Seng ;
Jin, Lianwen ;
Sun, Yipeng ;
Fan, Lixin .
DOCUMENT ANALYSIS AND RECOGNITION, ICDAR 2021, PT IV, 2021, 12824 :663-677
[46]   ICDAR2015 Competition on Text Image Super-Resolution [J].
Peyrard, Clement ;
Baccouche, Moez ;
Mamalet, Franck ;
Garcia, Christophe .
2015 13TH IAPR INTERNATIONAL CONFERENCE ON DOCUMENT ANALYSIS AND RECOGNITION (ICDAR), 2015, :1201-1205
[47]   Customised OCR Correction for Historical Medical Text [J].
Thompson, Paul ;
McNaught, John ;
Ananiadou, Sophia .
2015 DIGITAL HERITAGE INTERNATIONAL CONGRESS, VOL 1: DIGITIZATION & ACQUISITION, COMPUTER GRAPHICS & INTERACTION, 2015, :35-42
[48]   A Spell Correction Model for OCR Errors for Arabic Text [J].
Muhammad, Mariam ;
ELGhazaly, Tarek ;
Ezzat, Mostafa ;
Gheith, Mervat .
PROCEEDINGS OF THE INTERNATIONAL CONFERENCE ON ADVANCED INTELLIGENT SYSTEMS AND INFORMATICS 2016, 2017, 533 :124-136
[49]   OCR Error Correction for Unconstrained Vietnamese Handwritten Text [J].
Nguyen, Quoc-Dung ;
Le, Duc-Anh ;
Zelinka, Ivan .
SOICT 2019: PROCEEDINGS OF THE TENTH INTERNATIONAL SYMPOSIUM ON INFORMATION AND COMMUNICATION TECHNOLOGY, 2019, :132-138
[50]   OCR Post Correction for Endangered Language Texts [J].
Rijhwani, Shruti ;
Anastasopoulos, Antonios ;
Neubig, Graham .
PROCEEDINGS OF THE 2020 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP), 2020, :5931-5942