Building test data from real outbreaks for evaluating detection algorithms

被引:3
|
作者
Texier, Gaetan [1 ,2 ]
Jackson, Michael L. [3 ]
Siwe, Leonel [4 ]
Meynard, Jean-Baptiste [5 ]
Deparis, Xavier [5 ]
Chaudet, Herve [2 ]
机构
[1] Pasteur Ctr Cameroun, Yaounde, Cameroon
[2] Aix Marseille Univ, INSERM, IRD, UMR 912,SESSTIM,Fac Med, 27,Bd Jean Moulin, Marseille, France
[3] Grp Hlth Res Inst, Seattle, WA USA
[4] ISSEA, Yaounde, Cameroon
[5] French Armed Forces Ctr Epidemiol & Publ Hlth CES, Marseille, France
来源
PLOS ONE | 2017年 / 12卷 / 09期
关键词
INCUBATION PERIOD; INFECTIOUS-DISEASES; MARKOV-CHAINS; SURVEILLANCE; INFORMATION; DIVERGENCE; MODEL;
D O I
10.1371/journal.pone.0183992
中图分类号
O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];
学科分类号
07 ; 0710 ; 09 ;
摘要
Benchmarking surveillance systems requires realistic simulations of disease outbreaks. However, obtaining these data in sufficient quantity, with a realistic shape and covering a sufficient range of agents, size and duration, is known to be very difficult. The dataset of outbreak signals generated should reflect the likely distribution of authentic situations faced by the surveillance system, including very unlikely outbreak signals. We propose and evaluate a new approach based on the use of historical outbreak data to simulate tailored outbreak signals. The method relies on a homothetic transformation of the historical distribution followed by resampling processes (Binomial, Inverse Transform Sampling Method-ITSM, Metropolis-Hasting Random Walk, Metropolis-Hasting Independent, Gibbs Sampler, Hybrid Gibbs Sampler). We carried out an analysis to identify the most important input parameters for simulation quality and to evaluate performance for each of the resampling algorithms. Our analysis confirms the influence of the type of algorithm used and simulation parameters (i.e. days, number of cases, outbreak shape, overall scale factor) on the results. We show that, regardless of the outbreaks, algorithms and metrics chosen for the evaluation, simulation quality decreased with the increase in the number of days simulated and increased with the number of cases simulated. Simulating outbreaks with fewer cases than days of duration (i.e. overall scale factor less than 1) resulted in an important loss of information during the simulation. We found that Gibbs sampling with a shrinkage procedure provides a good balance between accuracy and data dependency. If dependency is of little importance, binomial and ITSM methods are accurate. Given the constraint of keeping the simulation within a range of plausible epidemiological curves faced by the surveillance system, our study confirms that our approach can be used to generate a large spectrum of outbreak signals.
引用
收藏
页数:17
相关论文
共 50 条
  • [1] Evaluating spatial surveillance: detection of known outbreaks in real data
    Kleinman, K
    Abrams, A
    Yih, WK
    Platt, R
    Kulldorff, M
    STATISTICS IN MEDICINE, 2006, 25 (05) : 755 - 769
  • [2] Evaluation of the application of sequence data to the identification of outbreaks of disease using anomaly detection methods
    Diaz-Cao, Jose Manuel
    Liu, Xin
    Kim, Jeonghoon
    Clavijo, Maria Jose
    Martinez-Lopez, Beatriz
    VETERINARY RESEARCH, 2023, 54 (01) : 75
  • [3] Detection of Infectious Disease Outbreaks From Laboratory Data With Reporting Delays
    Noufaily, Angela
    Farrington, Paddy
    Garthwaite, Paul
    Enki, Doyo Gragn
    Andrews, Nick
    Charlett, Andre
    JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, 2016, 111 (514) : 488 - 499
  • [4] Big Data Opportunities for Disease Outbreaks Detection in Global Mass Gatherings
    Alshammari, Sultanah M.
    Mikler, Armin M.
    PROCEEDINGS OF THE 2018 INTERNATIONAL CONFERENCE ON BIG DATA AND EDUCATION (ICBDE 2018), 2018, : 16 - 21
  • [5] Evaluating acoustic speaker normalization algorithms: Evidence from longitudinal child data
    Kohn, Mary Elizabeth
    Farrington, Charlie
    JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 2012, 131 (03) : 2237 - 2248
  • [6] Contagion Source Detection in Epidemic and Infodemic Outbreaks: Mathematical Analysis and Network Algorithms
    Tan, Chee Wei
    Yu, Pei-Duo
    FOUNDATIONS AND TRENDS IN NETWORKING, 2023, 13 (2-3): : 106 - 251
  • [7] Prediction for Global Peste des Petits Ruminants Outbreaks Based on a Combination of Random Forest Algorithms and Meteorological Data
    Niu, Bing
    Liang, Ruirui
    Zhou, Guangya
    Zhang, Qiang
    Su, Qiang
    Qu, Xiaosheng
    Chen, Qin
    FRONTIERS IN VETERINARY SCIENCE, 2021, 7
  • [8] Monitoring sick leave data for early detection of influenza outbreaks
    Duchemin, Tom
    Bastard, Jonathan
    Ante-Testard, Pearl Anne
    Assab, Rania
    Daouda, Oumou Salama
    Duval, Audrey
    Garsi, Jerome-Philippe
    Lounissi, Radowan
    Nekkab, Narimane
    Neynaud, Helene
    Smith, David R. M.
    Dab, William
    Jean, Kevin
    Temime, Laura
    Hocine, Mounia N.
    BMC INFECTIOUS DISEASES, 2021, 21 (01)
  • [9] Monitoring sick leave data for early detection of influenza outbreaks
    Tom Duchemin
    Jonathan Bastard
    Pearl Anne Ante-Testard
    Rania Assab
    Oumou Salama Daouda
    Audrey Duval
    Jérôme-Philippe Garsi
    Radowan Lounissi
    Narimane Nekkab
    Helene Neynaud
    David R. M. Smith
    William Dab
    Kevin Jean
    Laura Temime
    Mounia N. Hocine
    BMC Infectious Diseases, 21
  • [10] Morphology-based Building Detection from Airborne Lidar Data
    Meng, Xuelian
    Wang, Le
    Currit, Nate
    PHOTOGRAMMETRIC ENGINEERING AND REMOTE SENSING, 2009, 75 (04) : 437 - 442