ICDAR2017 Competition on Post-OCR Text Correction

被引:26
作者
Chiron, Guillaume [1 ]
Doucet, Antoine [2 ]
Coustaty, Mickael [2 ]
Moreux, Jean-Philippe [1 ]
机构
[1] Natl Lib France, F-75706 Paris, France
[2] Univ La Rochelle, Lab L3i, Av Michel Crepeau, F-17000 La Rochelle, France
来源
2017 14TH IAPR INTERNATIONAL CONFERENCE ON DOCUMENT ANALYSIS AND RECOGNITION (ICDAR), VOL 1 | 2017年
关键词
D O I
10.1109/ICDAR.2017.232
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
This paper describes the ICDAR2017 competition on post-OCR text correction and presents the different methods submitted by the participants. OCR has been an active research field for over the past 30 years but results are still imperfect, especially for historical documents. The purpose of this competition is to compare and evaluate automatic approaches for correcting (denoising) OCR-ed texts. The challenge consists of two independent tasks: 1) error detection and 2) error correction. An original dataset of 12M OCR-ed symbols along with an aligned ground truth was provided to the participants with 80% of the dataset dedicated to the training and 20% to the evaluation. Different sources were aggregated and namely contain newspapers and monographs covering 2 languages (English and French). 11 teams submitted results, while the difficulty of the task was underlined by the fact that only half of the submitted methods were able to denoise the evaluation dataset on average. In any case, this competition, which counted 35 registrations, illustrates the strong interest of the community in this essential problem, which is key to any digitization process involving textual data.
引用
收藏
页码:1423 / 1428
页数:6
相关论文
共 50 条
  • [31] ICDAR2017 Robust Reading Challenge on Multi-lingual Scene Text Detection and Script Identification - RRC-MLT
    Nayef, Nibal
    Yin, Fei
    Bizid, Imen
    Choi, Hyunsoo
    Feng, Yuan
    Karatzas, Dimosthenis
    Luo, Zhenbo
    Pal, Umapada
    Rigaud, Christophe
    Chazalon, Joseph
    Khlif, Wafa
    Luqman, Muhammad Muzzamil
    Burie, Jean-Christophe
    Liu, Cheng-Lin
    Ogier, Jean-Marc
    2017 14TH IAPR INTERNATIONAL CONFERENCE ON DOCUMENT ANALYSIS AND RECOGNITION (ICDAR), VOL 1, 2017, : 1454 - 1459
  • [32] ICDAR 2024 Competition on Artistic Text Recognition
    Xie, Xudong
    Deng, Linger
    Zhang, Zhifei
    Wang, Zhaowen
    Liu, Yuliang
    DOCUMENT ANALYSIS AND RECOGNITION-ICDAR 2024, PT VI, 2024, 14809 : 301 - 314
  • [33] ICDAR 2005 text locating competition results
    Lucas, SM
    EIGHTH INTERNATIONAL CONFERENCE ON DOCUMENT ANALYSIS AND RECOGNITION, VOLS 1 AND 2, PROCEEDINGS, 2005, : 80 - 84
  • [34] ICDAR2015 Competition on Smartphone Document Capture and OCR (SmartDoc)
    Burie, J. C.
    Chazalon, J.
    Coustaty, M.
    Eskenazi, S.
    Luqman, M. M.
    Mehri, M.
    Nayef, N.
    Ogier, J. M.
    Prum, S.
    Rusinol, M.
    2015 13TH IAPR INTERNATIONAL CONFERENCE ON DOCUMENT ANALYSIS AND RECOGNITION (ICDAR), 2015, : 1161 - 1165
  • [35] OCR post-correction for detecting adversarial text images
    Imam, Niddal H.
    Vassilakis, Vassilios G.
    Kolovos, Dimitris
    JOURNAL OF INFORMATION SECURITY AND APPLICATIONS, 2022, 66
  • [36] ICDAR 2021 Competition on Scene Video Text Spotting
    Cheng, Zhanzhan
    Lu, Jing
    Zou, Baorui
    Zhou, Shuigeng
    Wu, Fei
    DOCUMENT ANALYSIS AND RECOGNITION, ICDAR 2021, PT IV, 2021, 12824 : 650 - 662
  • [37] Cleaning Dirty Books: Post-OCR Processing for Previously Scanned Texts
    Kim, Allen
    Pethe, Charuta
    Inoue, Naoya
    Skiena, Steven
    FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, EMNLP 2021, 2021, : 4217 - 4226
  • [38] ICDAR 2015 Competition on Text Line Detection in Historical Documents
    Murdock, Michael
    Reid, Shawn
    Hamilton, Blaine
    Reese, Jackson
    2015 13TH IAPR INTERNATIONAL CONFERENCE ON DOCUMENT ANALYSIS AND RECOGNITION (ICDAR), 2015, : 1171 - 1175
  • [39] ICDAR 2017 Competition on the Classification of Medieval Handwritings in Latin Script
    Cloppet, Florence
    Eglin, Veronique
    Helias-Baron, Marlene
    Kieu, Cuong
    Stutzmann, Dominique
    Vincent, Nicole
    2017 14TH IAPR INTERNATIONAL CONFERENCE ON DOCUMENT ANALYSIS AND RECOGNITION (ICDAR), VOL 1, 2017, : 1371 - 1376
  • [40] ICDAR 2024 Competition on Handwritten Text Recognition in Brazilian Essays - BRESSAY
    Neto, Arthur F. S.
    Bezerra, Byron L. D.
    Araujo, Savio S.
    Souza, Wiliane M. A. S.
    Alves, Kleberson F.
    Oliveira, Macileide F.
    Lins, Samara V. S.
    Hazin, Hugo J. F.
    Rocha, Pedro H., V
    Toselli, Alejandro H.
    DOCUMENT ANALYSIS AND RECOGNITION-ICDAR 2024, PT VI, 2024, 14809 : 345 - 362