Maximizing human effort for analyzing scientific images: A case study using digitized herbarium sheets

被引:12
作者
Brenskelle, Laura [1 ,2 ]
Guralnick, Rob P. [1 ]
Denslow, Michael [1 ]
Stucky, Brian J. [1 ]
机构
[1] Univ Florida, Florida Museum Nat Hist, Gainesville, FL 32611 USA
[2] Univ Florida, Dept Biol, Gainesville, FL 32611 USA
基金
美国国家科学基金会;
关键词
citizen science; herbarium specimens; image annotation; machine learning; phenology; specimen images;
D O I
10.1002/aps3.11370
中图分类号
Q94 [植物学];
学科分类号
071001 ;
摘要
Premise Digitization and imaging of herbarium specimens provides essential historical phenotypic and phenological information about plants. However, the full use of these resources requires high-quality human annotations for downstream use. Here we provide guidance on the design and implementation of image annotation projects for botanical research. Methods and Results We used a novel gold-standard data set to test the accuracy of human phenological annotations of herbarium specimen images in two settings: structured, in-person sessions and an online, community-science platform. We examined how different factors influenced annotation accuracy and found that botanical expertise, academic career level, and time spent on annotations had little effect on accuracy. Rather, key factors included traits and taxa being scored, the annotation setting, and the individual scorer. In-person annotations were significantly more accurate than online annotations, but both generated relatively high-quality outputs. Gathering multiple, independent annotations for each image improved overall accuracy. Conclusions Our results provide a best-practices basis for using human effort to annotate images of plants. We show that scalable community science mechanisms can produce high-quality data, but care must be taken to choose tractable taxa and phenophases and to provide informative training material.
引用
收藏
页数:9
相关论文
共 19 条
[1]   Fitting Linear Mixed-Effects Models Using lme4 [J].
Bates, Douglas ;
Maechler, Martin ;
Bolker, Benjamin M. ;
Walker, Steven C. .
JOURNAL OF STATISTICAL SOFTWARE, 2015, 67 (01) :1-48
[2]  
Brenskelle L., 2020, MAXIMIZING HUMAN EFF, DOI [10.5281/zenodo.3629569, DOI 10.5281/ZEN0D0.3629569]
[3]   The notes from nature tool for unlocking biodiversity records from museum records through citizen science [J].
Hill, Andrew ;
Guralnick, Robert ;
Smith, Arfon ;
Sallans, Andrew ;
Gillespie, Rosemary ;
Denslow, Michael ;
Gross, Joyce ;
Murrell, Zack ;
Conyers, Tim ;
Oboyski, Peter ;
Ball, Joan ;
Thomer, Andrea ;
Prys-Jones, Robert ;
de la Torre, Javier ;
Kociolek, Patrick ;
Fortson, Lucy .
ZOOKEYS, 2012, (209) :219-233
[4]   Densely Connected Convolutional Networks [J].
Huang, Gao ;
Liu, Zhuang ;
van der Maaten, Laurens ;
Weinberger, Kilian Q. .
30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, :2261-2269
[5]   Extension of Nakagawa & Schielzeth's R2GLMM to random slopes models [J].
Johnson, Paul C. D. .
METHODS IN ECOLOGY AND EVOLUTION, 2014, 5 (09) :944-946
[6]  
Johnson S.G., 2014, The NLopt nonlinear-optimization package
[7]  
Li YanMei Li YanMei, 2011, Chronica Horticulturae, V51, P28
[8]   Toward a large-scale and deep phenological stage annotation of herbarium specimens: Case studies from temperate, tropical, and equatorial floras [J].
Lorieul, Titouan ;
Pearson, Katelin D. ;
Ellwood, Elizabeth R. ;
Goeau, Herve ;
Molino, Jean-Francois ;
Sweeney, Patrick W. ;
Yost, Jennifer M. ;
Sachs, Joel ;
Mata-Montero, Erick ;
Nelson, Gil ;
Soltis, Pamela S. ;
Bonnet, Pierre ;
Joly, Alexis .
APPLICATIONS IN PLANT SCIENCES, 2019, 7 (03)
[9]   The coefficient of determination R2 and intra-class correlation coefficient from generalized linear mixed-effects models revisited and expanded [J].
Nakagawa, Shinichi ;
Johnson, Paul C. D. ;
Schielzeth, Holger .
JOURNAL OF THE ROYAL SOCIETY INTERFACE, 2017, 14 (134)
[10]   A general and simple method for obtaining R2 from generalized linear mixed-effects models [J].
Nakagawa, Shinichi ;
Schielzeth, Holger .
METHODS IN ECOLOGY AND EVOLUTION, 2013, 4 (02) :133-142