Outlier detection methods to improve the quality of citizen science data

被引:10
|
作者
Li, Jennifer S. [1 ]
Hamann, Andreas [1 ]
Beaubien, Elisabeth [1 ]
机构
[1] Univ Alberta, Dept Renewable Resources, Fac Agr Life & Environm Sci, 751 Gen Serv Bldg, Edmonton, AB T6G 2H1, Canada
关键词
Citizen science; Data cleaning; Outlier detection; Data management; Plant phenology; Climate change; PLANT PHENOLOGY; ALBERTA; KNOWLEDGE; TOOL;
D O I
10.1007/s00484-020-01968-z
中图分类号
Q6 [生物物理学];
学科分类号
071011 ;
摘要
Citizen science involves public participation in research, usually through volunteer observation and reporting. Data collected by citizen scientists are a valuable resource in many fields of research that require long-term observations at large geographic scales. However, such data may be perceived as less accurate than those collected by trained professionals. Here, we analyze the quality of data from a plant phenology network, which tracks biological response to climate change. We apply five algorithms designed to detect outlier observations or inconsistent observers. These methods rely on different quantitative approaches, including residuals of linear models, correlations among observers, deviations from multivariate clusters, and percentile-based outlier removal. We evaluated these methods by comparing the resulting cleaned datasets in terms of time series means, spatial data coverage, and spatial autocorrelations after outlier removal. Spatial autocorrelations were used to determine the efficacy of outlier removal, as they are expected to increase if outliers and inconsistent observations are successfully removed. All data cleaning methods resulted in better Moran'sIautocorrelation statistics, with percentile-based outlier removal and the clustering method showing the greatest improvement. Methods based on residual analysis of linear models had the strongest impact on the final bloom time mean estimates, but were among the weakest based on autocorrelation analysis. Removing entire sets of observations from potentially unreliable observers proved least effective. In conclusion, percentile-based outlier removal emerges as a simple and effective method to improve reliability of citizen science phenology observations.
引用
收藏
页码:1825 / 1833
页数:9
相关论文
共 50 条
  • [41] A comparison of multivariate outlier detection methods for clinical laboratory safety data
    Penny, KI
    Jolliffe, IT
    JOURNAL OF THE ROYAL STATISTICAL SOCIETY SERIES D-THE STATISTICIAN, 2001, 50 : 295 - 308
  • [42] Electricity Consumption Data Analysis Using Various Outlier Detection Methods
    Kaddour, Sidi Mohammed
    Lehsaini, Mohamed
    INTERNATIONAL JOURNAL OF SOFTWARE SCIENCE AND COMPUTATIONAL INTELLIGENCE-IJSSCI, 2021, 13 (03): : 12 - 27
  • [43] A survey on unsupervised subspace outlier detection methods for high dimensional data
    Ahn, Jaehyeong
    Kwon, Sunghoon
    KOREAN JOURNAL OF APPLIED STATISTICS, 2021, 34 (03) : 507 - 521
  • [44] Assessment of Smartphone Positioning Data Quality in the Scope of Citizen Science Contributions
    Lopez, Angel J.
    Semanjski, Ivana
    Gautama, Sidharta
    Ochoa, Daniel
    MOBILE INFORMATION SYSTEMS, 2017, 2017
  • [45] Assessing citizen science data quality: an invasive species case study
    Crall, Alycia W.
    Newman, Gregory J.
    Stohlgren, Thomas J.
    Holfelder, Kirstin A.
    Graham, Jim
    Waller, Donald M.
    CONSERVATION LETTERS, 2011, 4 (06): : 433 - 442
  • [46] Tradeoffs and tools for data quality, privacy, transparency, and trust in citizen science
    Anhalt-Depies, Christine
    Stenglein, Jennifer L.
    Zuckerberg, Benjamin
    Townsend, Philip A.
    Rissman, Adena R.
    BIOLOGICAL CONSERVATION, 2019, 238
  • [47] Impact of Chatbots on User Experience and Data Quality on Citizen Science Platforms
    Kessel, Akasha-Leonie
    Sahri, Soror
    Groppe, Sven
    Groppe, Jinghua
    Khorashadizadeh, Hanieh
    Pignal, Marc
    Perez Pimpare, Eva
    Vignes-Lebbe, Regine
    COMPUTERS, 2025, 14 (01)
  • [48] QualESTIM: Interactive Quality Assessment of Socioeconomic Data using Outlier Detection
    Plumejeaud, Christine
    Villanova-Oliver, Marlene
    BRIDGING THE GEOGRAPHIC INFORMATION SCIENCES, 2012, : 143 - 160
  • [49] FAIRification of Citizen Science Data
    Alvarez Luna, Reynaldo
    Zubcoff, Jose
    Garrigos, Irene
    Gonz, Hector
    WEB ENGINEERING (ICWE 2022), 2022, 13362 : 450 - 454
  • [50] Outlier detection in interval data
    A. Pedro Duarte Silva
    Peter Filzmoser
    Paula Brito
    Advances in Data Analysis and Classification, 2018, 12 : 785 - 822