automated classification;
citizen science;
crowdsourcing;
false-positive error;
misclassification;
remote camera;
species distribution model;
DATA QUALITY;
CAMERA TRAPS;
ERROR;
BIODIVERSITY;
CHALLENGES;
VOLUNTEER;
MODELS;
STATE;
TOOL;
D O I:
10.1002/eap.1849
中图分类号:
Q14 [生态学(生物生态学)];
学科分类号:
071012 ;
0713 ;
摘要:
Measurement or observation error is common in ecological data: as citizen scientists and automated algorithms play larger roles processing growing volumes of data to address problems at large scales, concerns about data quality and strategies for improving it have received greater focus. However, practical guidance pertaining to fundamental data quality questions for data users or managers-how accurate do data need to be and what is the best or most efficient way to improve it?-remains limited. We present a generalizable framework for evaluating data quality and identifying remediation practices, and demonstrate the framework using trail camera images classified using crowdsourcing to determine acceptable rates of misclassification and identify optimal remediation strategies for analysis using occupancy models. We used expert validation to estimate baseline classification accuracy and simulation to determine the sensitivity of two occupancy estimators (standard and false-positive extensions) to different empirical misclassification rates. We used regression techniques to identify important predictors of misclassification and prioritize remediation strategies. More than 93% of images were accurately classified, but simulation results suggested that most species were not identified accurately enough to permit distribution estimation at our predefined threshold for accuracy (<5% absolute bias). A model developed to screen incorrect classifications predicted misclassified images with >97% accuracy: enough to meet our accuracy threshold. Occupancy models that accounted for false-positive error provided even more accurate inference even at high rates of misclassification (30%). As simulation suggested occupancy models were less sensitive to additional false-negative error, screening models or fitting occupancy models accounting for false-positive error emerged as efficient data remediation solutions. Combining simulation-based sensitivity analysis with empirical estimation of baseline error and its variability allows users and managers of potentially error-prone data to identify and fix problematic data more efficiently. It may be particularly helpful for "big data" efforts dependent upon citizen scientists or automated classification algorithms with many downstream users, but given the ubiquity of observation or measurement error, even conventional studies may benefit from focusing more attention upon data quality.
机构:
Queen Mary Univ London QMUL, Sch Comp Sci & Elect Engn, London E1 4NS, EnglandQueen Mary Univ London QMUL, Sch Comp Sci & Elect Engn, London E1 4NS, England
Poslad, Stefan
Irum, Tayyaba
论文数: 0引用数: 0
h-index: 0
机构:
Queen Mary Univ London QMUL, Sch Comp Sci & Elect Engn, London E1 4NS, EnglandQueen Mary Univ London QMUL, Sch Comp Sci & Elect Engn, London E1 4NS, England
Irum, Tayyaba
Charlton, Patricia
论文数: 0引用数: 0
h-index: 0
机构:
Open Univ OU, Inst Educ Technol, Sch Comp & Commun, Milton Keynes MK7 6AA, Bucks, EnglandQueen Mary Univ London QMUL, Sch Comp Sci & Elect Engn, London E1 4NS, England
Charlton, Patricia
Mumtaz, Rafia
论文数: 0引用数: 0
h-index: 0
机构:
Natl Univ Sci & Technol NUST, Sch Elect Engn & Comp Sci, Islamabad 44000, PakistanQueen Mary Univ London QMUL, Sch Comp Sci & Elect Engn, London E1 4NS, England
Mumtaz, Rafia
Azam, Muhammad
论文数: 0引用数: 0
h-index: 0
机构:
Whitecliffe Coll, Sch Informat Technol, Auckland 1010, New ZealandQueen Mary Univ London QMUL, Sch Comp Sci & Elect Engn, London E1 4NS, England
Azam, Muhammad
Zaidi, Hassan
论文数: 0引用数: 0
h-index: 0
机构:
Natl Univ Sci & Technol NUST, Sch Elect Engn & Comp Sci, Islamabad 44000, PakistanQueen Mary Univ London QMUL, Sch Comp Sci & Elect Engn, London E1 4NS, England
Zaidi, Hassan
Herodotou, Christothea
论文数: 0引用数: 0
h-index: 0
机构:
Open Univ OU, Inst Educ Technol, Sch Comp & Commun, Milton Keynes MK7 6AA, Bucks, EnglandQueen Mary Univ London QMUL, Sch Comp Sci & Elect Engn, London E1 4NS, England
Herodotou, Christothea
Yu, Guangxia
论文数: 0引用数: 0
h-index: 0
机构:
Queen Mary Univ London QMUL, Sch Comp Sci & Elect Engn, London E1 4NS, EnglandQueen Mary Univ London QMUL, Sch Comp Sci & Elect Engn, London E1 4NS, England
Yu, Guangxia
Toosy, Fesal
论文数: 0引用数: 0
h-index: 0
机构:
Univ Cent Punjab UCP, Fac Engn, Lahore 54000, PakistanQueen Mary Univ London QMUL, Sch Comp Sci & Elect Engn, London E1 4NS, England
机构:
Univ Nat Resources & Life Sci, Inst Hydrobiol & Aquat Ecosyst Management, Gregor Mendelstr 33, A-1180 Vienna, Austria
WasserCluster Lunz, Dr Carl Kupelwieser Promenade 5, A-3293 Lunz Am See, AustriaUniv Nat Resources & Life Sci, Inst Hydrobiol & Aquat Ecosyst Management, Gregor Mendelstr 33, A-1180 Vienna, Austria
Weigelhofer, Gabriele
Poelz, Eva-Maria
论文数: 0引用数: 0
h-index: 0
机构:
WasserCluster Lunz, Dr Carl Kupelwieser Promenade 5, A-3293 Lunz Am See, AustriaUniv Nat Resources & Life Sci, Inst Hydrobiol & Aquat Ecosyst Management, Gregor Mendelstr 33, A-1180 Vienna, Austria
Poelz, Eva-Maria
Hein, Thomas
论文数: 0引用数: 0
h-index: 0
机构:
Univ Nat Resources & Life Sci, Inst Hydrobiol & Aquat Ecosyst Management, Gregor Mendelstr 33, A-1180 Vienna, Austria
WasserCluster Lunz, Dr Carl Kupelwieser Promenade 5, A-3293 Lunz Am See, AustriaUniv Nat Resources & Life Sci, Inst Hydrobiol & Aquat Ecosyst Management, Gregor Mendelstr 33, A-1180 Vienna, Austria
机构:
Univ Montpellier, French Agr Res Ctr Int Dev CIRAD, French Natl Ctr Sci Res CNRS,French Natl Inst Agr, Bot & Modeling Plant Architecture & Vegetat AMAP, F-34398 Montpellier, FranceEarthwatch Europe, Oxford OX2 7DE, England
Bonnet, Pierre
Soacha, Karen
论文数: 0引用数: 0
h-index: 0
机构:
Spanish Res Council ICM CSIC, EMBIMOS Grp, Inst Marine Sci, Barcelona 08003, Spain
Open Univ Catalonia UOC, Doctoral Sch, Informat & Knowledge, Barcelona 08035, SpainEarthwatch Europe, Oxford OX2 7DE, England
Soacha, Karen
Linan, Sonia
论文数: 0引用数: 0
h-index: 0
机构:
Spanish Res Council ICM CSIC, EMBIMOS Grp, Inst Marine Sci, Barcelona 08003, SpainEarthwatch Europe, Oxford OX2 7DE, England
Linan, Sonia
Woods, Tim
论文数: 0引用数: 0
h-index: 0
机构:
European Citizen Sci Assoc ECSA, Invalidenstr 43, D-10115 Berlin, GermanyEarthwatch Europe, Oxford OX2 7DE, England
Woods, Tim
Piera, Jaume
论文数: 0引用数: 0
h-index: 0
机构:
Spanish Res Council ICM CSIC, EMBIMOS Grp, Inst Marine Sci, Barcelona 08003, SpainEarthwatch Europe, Oxford OX2 7DE, England