Building test data from real outbreaks for evaluating detection algorithms

被引:3
作者
Texier, Gaetan [1 ,2 ]
Jackson, Michael L. [3 ]
Siwe, Leonel [4 ]
Meynard, Jean-Baptiste [5 ]
Deparis, Xavier [5 ]
Chaudet, Herve [2 ]
机构
[1] Pasteur Ctr Cameroun, Yaounde, Cameroon
[2] Aix Marseille Univ, INSERM, IRD, UMR 912,SESSTIM,Fac Med, 27,Bd Jean Moulin, Marseille, France
[3] Grp Hlth Res Inst, Seattle, WA USA
[4] ISSEA, Yaounde, Cameroon
[5] French Armed Forces Ctr Epidemiol & Publ Hlth CES, Marseille, France
来源
PLOS ONE | 2017年 / 12卷 / 09期
关键词
INCUBATION PERIOD; INFECTIOUS-DISEASES; MARKOV-CHAINS; SURVEILLANCE; INFORMATION; DIVERGENCE; MODEL;
D O I
10.1371/journal.pone.0183992
中图分类号
O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];
学科分类号
07 ; 0710 ; 09 ;
摘要
Benchmarking surveillance systems requires realistic simulations of disease outbreaks. However, obtaining these data in sufficient quantity, with a realistic shape and covering a sufficient range of agents, size and duration, is known to be very difficult. The dataset of outbreak signals generated should reflect the likely distribution of authentic situations faced by the surveillance system, including very unlikely outbreak signals. We propose and evaluate a new approach based on the use of historical outbreak data to simulate tailored outbreak signals. The method relies on a homothetic transformation of the historical distribution followed by resampling processes (Binomial, Inverse Transform Sampling Method-ITSM, Metropolis-Hasting Random Walk, Metropolis-Hasting Independent, Gibbs Sampler, Hybrid Gibbs Sampler). We carried out an analysis to identify the most important input parameters for simulation quality and to evaluate performance for each of the resampling algorithms. Our analysis confirms the influence of the type of algorithm used and simulation parameters (i.e. days, number of cases, outbreak shape, overall scale factor) on the results. We show that, regardless of the outbreaks, algorithms and metrics chosen for the evaluation, simulation quality decreased with the increase in the number of days simulated and increased with the number of cases simulated. Simulating outbreaks with fewer cases than days of duration (i.e. overall scale factor less than 1) resulted in an important loss of information during the simulation. We found that Gibbs sampling with a shrinkage procedure provides a good balance between accuracy and data dependency. If dependency is of little importance, binomial and ITSM methods are accurate. Given the constraint of keeping the simulation within a range of plausible epidemiological curves faced by the surveillance system, our study confirms that our approach can be used to generate a large spectrum of outbreak signals.
引用
收藏
页数:17
相关论文
共 50 条
  • [21] Evaluation of the application of sequence data to the identification of outbreaks of disease using anomaly detection methods
    José Manuel Díaz-Cao
    Xin Liu
    Jeonghoon Kim
    Maria Jose Clavijo
    Beatriz Martínez-López
    Veterinary Research, 54
  • [22] INTELLIGENT REAL TIME DATA INTERPRETATION OF ROCKET TEST DATA TO IDENTIFY SUDDEN TRANSITIONS
    Mahajan, Ajay
    Oesch, Christopher
    Figueroa, Fernando
    PROCEEDINGS OF THE ASME DYNAMIC SYSTEMS AND CONTROL CONFERENCE 2010, VOL 2, 2010, : 719 - 726
  • [23] Orbivirus detection from Culicoides collected on African horse sickness outbreaks in Namibia
    Goffredo, Maria
    Savini, Giovanni
    Quaglia, Michela
    Molini, Umberto
    Federici, Valentina
    Catalani, Monica
    Portanti, Ottavio
    Marini, Valeria
    Florentius, Maseke Adrianus
    Pini, Attilio
    Scacchia, Massimo
    VETERINARIA ITALIANA, 2015, 51 (01) : 17 - 23
  • [24] Outlier Detection Algorithms Over Fuzzy Data with Weighted Least Squares
    Nikolova, Natalia
    Rodriguez, Rosa M.
    Symes, Mark
    Toneva, Daniela
    Kolev, Krasimir
    Tenekedjiev, Kiril
    INTERNATIONAL JOURNAL OF FUZZY SYSTEMS, 2021, 23 (05) : 1234 - 1256
  • [25] Assessment of Data Fusion Algorithms for Earth Observation Change Detection Processes
    Molina, Inigo
    Martinez, Estibaliz
    Morillo, Carmen
    Velasco, Jesus
    Jara, Alvaro
    SENSORS, 2016, 16 (10)
  • [26] Synthetic Data Resource and Benchmarks for Time Cell Analysis and Detection Algorithms
    Ananthamurthy, Kambadur G.
    Bhalla, Upinder S.
    ENEURO, 2023, 10 (03) : 21 - 21
  • [27] Evaluating modularity in morphometric data: challenges with the RV coefficient and a new test measure
    Adams, Dean C.
    METHODS IN ECOLOGY AND EVOLUTION, 2016, 7 (05): : 565 - 572
  • [28] Cross Domain Data Generation for Smart Building Fault Detection and Diagnosis
    Li, Dan
    Xu, Yudong
    Zhou, Yuxun
    Gou, Chao
    Ng, See-Kiong
    MATHEMATICS, 2022, 10 (21)
  • [29] Evaluating the Discrete Generalized Rayleigh Distribution: Statistical Inferences and Applications to Real Data Analysis
    Ahmad, Hanan Haj
    Ramadan, Dina A.
    Almetwally, Ehab M.
    MATHEMATICS, 2024, 12 (02)
  • [30] A New Statistical Method to Detect Disease Outbreaks from Hospital Emergency Department Data
    Yoon, Jin
    Boyle, Justin
    MEDINFO 2023 - THE FUTURE IS ACCESSIBLE, 2024, 310 : 886 - 890