The use of Optical Character Recognition (OCR) in the digitisation of herbarium specimen labels

被引:17
作者
Drinkwater, Robyn E. [1 ]
Cubey, Robert W. N. [1 ]
Haston, Elspeth M. [1 ]
机构
[1] Royal Bot Garden Edinburgh, Edinburgh EH3 5LR, Midlothian, Scotland
基金
美国安德鲁·梅隆基金会;
关键词
OCR; Digitisation; Data entry; Specimen; Label; Herbarium; BIOLOGICAL COLLECTIONS; WORKFLOWS;
D O I
10.3897/phytokeys.38.7168
中图分类号
Q94 [植物学];
学科分类号
071001 ;
摘要
At the Royal Botanic Garden Edinburgh (RBGE) the use of Optical Character Recognition (OCR) to aid the digitisation process has been investigated. This was tested using a herbarium specimen digitisation process with two stages of data entry. Records were initially batch-processed to add data extracted from the OCR text prior to being sorted based on Collector and/or Country. Using images of the specimens, a team of six digitisers then added data to the specimen records. To investigate whether the data from OCR aid the digitisation process, they completed a series of trials which compared the efficiency of data entry between sorted and unsorted batches of specimens. A survey was carried out to explore the opinion of the digitisation staff to the different sorting options. In total 7,200 specimens were processed. When compared to an unsorted, random set of specimens, those which were sorted based on data added from the OCR were quicker to digitise. Of the methods tested here, the most successful in terms of efficiency used a protocol which required entering data into a limited set of fields and where the records were filtered by Collector and Country. The survey and subsequent discussions with the digitisation staff highlighted their preference for working with sorted specimens, in which label layout, locations and handwriting are likely to be similar, and so a familiarity with the Collector or Country is rapidly established.
引用
收藏
页码:15 / 30
页数:16
相关论文
共 19 条
  • [1] The SALIX Method: A semi-automated workflow for herbarium specimen digitization
    Barber, Anne
    Lafferty, Daryl
    Landrum, Leslie R.
    [J]. TAXON, 2013, 62 (03) : 581 - 590
  • [2] Beaman R. S., 2006, Botany 2006. Botanical Cyberinfrastructure: Issues, Challenges, Opportunities, and Initiatives
  • [3] Herbaria are a major frontier for species discovery
    Bebber, Daniel P.
    Carine, Mark A.
    Wood, John R. I.
    Wortley, Alexandra H.
    Harris, David J.
    Prance, Ghillean T.
    Davidse, Gerrit
    Paige, Jay
    Pennington, Terry D.
    Robson, Norman K. B.
    Scotland, Robert W.
    [J]. PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 2010, 107 (51) : 22169 - 22171
  • [4] Berendsohn Walter G., 2010, BIODIVERSITY INFORM, V7, DOI [DOI 10.17161/BI.V7I2.3989, 10.17161/bi.v7i2.3989]
  • [5] Davis P.H., 1985, Flora of Turkey and the East Aegean Islands, V1
  • [6] Novel methods improve prediction of species' distributions from occurrence data
    Elith, J
    Graham, CH
    Anderson, RP
    Dudík, M
    Ferrier, S
    Guisan, A
    Hijmans, RJ
    Huettmann, F
    Leathwick, JR
    Lehmann, A
    Li, J
    Lohmann, LG
    Loiselle, BA
    Manion, G
    Moritz, C
    Nakamura, M
    Nakazawa, Y
    Overton, JM
    Peterson, AT
    Phillips, SJ
    Richardson, K
    Scachetti-Pereira, R
    Schapire, RE
    Soberón, J
    Williams, S
    Wisz, MS
    Zimmermann, NE
    [J]. ECOGRAPHY, 2006, 29 (02) : 129 - 151
  • [7] A decadal view of biodiversity informatics: challenges and priorities
    Hardisty, Alex
    Roberts, Dave
    [J]. BMC ECOLOGY, 2013, 13
  • [8] IV. SCIENTIFIC DATA AND BIODIVERSITY COLLECTIONS DATA CONCEPTS AND THEIR RELEVANCE FOR DATA CAPTURE IN LARGE SCALE DIGITISATION OF BIOLOGICAL COLLECTIONS
    Haston, Elspeth
    Cubey, Robert
    Harris, David J.
    [J]. INTERNATIONAL JOURNAL OF HUMANITIES AND ARTS COMPUTING, 2012, 6 (1-2) : 111 - 119
  • [9] Developing integrated workflows for the digitisation of herbarium specimens using a modular and scalable approach
    Haston, Elspeth
    Cubey, Robert
    Pullan, Martin
    Atkins, Hannah
    Harris, David J.
    [J]. ZOOKEYS, 2012, (209) : 93 - 102
  • [10] Heidorn P. B., 2008, Metadata for semantic and social applications, P57