Building test data from real outbreaks for evaluating detection algorithms

被引:4
作者
Texier, Gaetan [1 ,2 ]
Jackson, Michael L. [3 ]
Siwe, Leonel [4 ]
Meynard, Jean-Baptiste [5 ]
Deparis, Xavier [5 ]
Chaudet, Herve [2 ]
机构
[1] Pasteur Ctr Cameroun, Yaounde, Cameroon
[2] Aix Marseille Univ, INSERM, IRD, UMR 912,SESSTIM,Fac Med, 27,Bd Jean Moulin, Marseille, France
[3] Grp Hlth Res Inst, Seattle, WA USA
[4] ISSEA, Yaounde, Cameroon
[5] French Armed Forces Ctr Epidemiol & Publ Hlth CES, Marseille, France
关键词
INCUBATION PERIOD; INFECTIOUS-DISEASES; MARKOV-CHAINS; SURVEILLANCE; INFORMATION; DIVERGENCE; MODEL;
D O I
10.1371/journal.pone.0183992
中图分类号
O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];
学科分类号
07 ; 0710 ; 09 ;
摘要
Benchmarking surveillance systems requires realistic simulations of disease outbreaks. However, obtaining these data in sufficient quantity, with a realistic shape and covering a sufficient range of agents, size and duration, is known to be very difficult. The dataset of outbreak signals generated should reflect the likely distribution of authentic situations faced by the surveillance system, including very unlikely outbreak signals. We propose and evaluate a new approach based on the use of historical outbreak data to simulate tailored outbreak signals. The method relies on a homothetic transformation of the historical distribution followed by resampling processes (Binomial, Inverse Transform Sampling Method-ITSM, Metropolis-Hasting Random Walk, Metropolis-Hasting Independent, Gibbs Sampler, Hybrid Gibbs Sampler). We carried out an analysis to identify the most important input parameters for simulation quality and to evaluate performance for each of the resampling algorithms. Our analysis confirms the influence of the type of algorithm used and simulation parameters (i.e. days, number of cases, outbreak shape, overall scale factor) on the results. We show that, regardless of the outbreaks, algorithms and metrics chosen for the evaluation, simulation quality decreased with the increase in the number of days simulated and increased with the number of cases simulated. Simulating outbreaks with fewer cases than days of duration (i.e. overall scale factor less than 1) resulted in an important loss of information during the simulation. We found that Gibbs sampling with a shrinkage procedure provides a good balance between accuracy and data dependency. If dependency is of little importance, binomial and ITSM methods are accurate. Given the constraint of keeping the simulation within a range of plausible epidemiological curves faced by the surveillance system, our study confirms that our approach can be used to generate a large spectrum of outbreak signals.
引用
收藏
页数:17
相关论文
共 50 条
[41]   A data-driven hybrid sensor fault detection/diagnosis method with flight test data [J].
Song, Jinsheng ;
Chen, Ziqiao ;
Wang, Dong ;
Wen, Xin .
MEASUREMENT SCIENCE AND TECHNOLOGY, 2024, 35 (07)
[42]   Active Collection of Data in the Real Estate Cadastre in Systems with a Different Pedigree and a Different Way of Building Development: Learning from Poland and Slovakia [J].
Busko, Malgorzata ;
Zyga, Jacek ;
Hudecova, L'ubica ;
Kysel', Peter ;
Balawejder, Monika ;
Apollo, Michal .
SUSTAINABILITY, 2022, 14 (22)
[43]   Data-Driven Real-time Surveillance System for Tracking Disease Outbreaks: A Case Study of Lassa Fever Outbreak [J].
Wattamwar, Aniket ;
Akwafuo, Sampson ;
Mistry, Vritik .
2024 IEEE 12TH INTERNATIONAL CONFERENCE ON HEALTHCARE INFORMATICS, ICHI 2024, 2024, :344-349
[44]   Comparison of time-frequency-analysis techniques applied in building energy data noise cancellation for building load forecasting: A real-building case study [J].
Zhang, Liang ;
Alahmad, Mahmoud ;
Wen, Jin .
ENERGY AND BUILDINGS, 2021, 231
[45]   Data on biology and demographic parameters of the Aedes albopictus from dengue outbreaks in Klang Valley, Malaysia [J].
Dom, Nazri Che ;
Alhothily, Ibrahim Ahmed ;
Camalxaman, Siti Nazrina ;
Ismail, Sharifah Norkhadijah Syed .
DATA IN BRIEF, 2020, 31
[46]   Systematic data generation and test design for solution algorithms on the example of SALBPGen for assembly line balancing [J].
Otto, Alena ;
Otto, Christian ;
Scholl, Armin .
EUROPEAN JOURNAL OF OPERATIONAL RESEARCH, 2013, 228 (01) :33-45
[47]   Evaluating automated approaches to anaphylaxis case classification using unstructured data from the FDA Sentinel System [J].
Ball, Robert ;
Toh, Sengwee ;
Nolan, Jamie ;
Haynes, Kevin ;
Forshee, Richard ;
Botsis, Taxiarchis .
PHARMACOEPIDEMIOLOGY AND DRUG SAFETY, 2018, 27 (10) :1077-1084
[48]   Evaluating the Performances of Missing Data Handling Methods in Ability Estimation From Sparse Data [J].
Xiao, Jiaying ;
Bulut, Okan .
EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT, 2020, 80 (05) :932-954
[49]   Applications of deep learning algorithms for Supervisory Control and Data Acquisition intrusion detection system [J].
Balla, Asaad ;
Habaebi, Mohamed Hadi ;
Islam, Md. Rafiqul ;
Mubarak, Sinil .
CLEANER ENGINEERING AND TECHNOLOGY, 2022, 9
[50]   Using daily syndrome-specific absence data for early detection of school outbreaks: a pilot study in rural China [J].
Tan, L. ;
Cheng, L. ;
Yan, W. ;
Zhang, J. ;
Xu, B. ;
Diwan, V. K. ;
Dong, H. ;
Palm, L. ;
Wu, Y. ;
Long, L. ;
Tian, Y. ;
Nie, S. .
PUBLIC HEALTH, 2014, 128 (09) :792-798