Anomaly Detection in Endemic Disease Surveillance Data Using Machine Learning Techniques

被引:10
作者
Eze, Peter U. [1 ]
Geard, Nicholas [1 ]
Mueller, Ivo [2 ]
Chades, Iadine [3 ]
机构
[1] Univ Melbourne, Sch Comp & Informat Syst, Parkville, Vic 3010, Australia
[2] Walter & Eliza Hall Inst Med Res, Parkville, Vic 3052, Australia
[3] CSIRO, Ecosci Precinct, Dutton Pk, Qld 4102, Australia
基金
澳大利亚国家健康与医学研究理事会;
关键词
anomaly detection; malaria; machine learning; big data; ALGORITHM;
D O I
10.3390/healthcare11131896
中图分类号
R19 [保健组织与事业(卫生事业管理)];
学科分类号
摘要
Disease surveillance is used to monitor ongoing control activities, detect early outbreaks, and inform intervention priorities and policies. However, data from disease surveillance that could be used to support real-time decisionmaking remain largely underutilised. Using the Brazilian Amazon malaria surveillance dataset as a case study, in this paper we explore the potential for unsupervised anomaly detection machine learning techniques to discover signals of epidemiological interest. We found that our models were able to provide an early indication of outbreak onset, outbreak peaks, and change points in the proportion of positive malaria cases. Specifically, the sustained rise in malaria in the Brazilian Amazon in 2016 was flagged by several models. We found that no single model detected all anomalies across all health regions. Because of this, we provide the minimum number of machine learning models top-k models) to maximise the number of anomalies detected across different health regions. We discovered that the top three models that maximise the coverage of the number and types of anomalies detected across the thirteen health regions are principal component analysis, stochastic outlier selection, and the minimum covariance determinant. Anomaly detection is a potentially valuable approach to discovering patterns of epidemiological importance when confronted with a large volume of data across space and time. Our exploratory approach can be replicated for other diseases and locations to inform monitoring, timely interventions, and actions towards the goal of controlling endemic disease.
引用
收藏
页数:17
相关论文
共 33 条
[1]  
Abdiansah A., 2015, Int. J. Comput. Appl., V128, P28, DOI [10.5120/ijca2015906480, DOI 10.1109/ACCESS.2019.2953920]
[2]  
Akshara, 2021, ANAL VIDYA, V2336
[3]  
Ali M., 2020, Pycaret: an open source, low-code machine learning library in python, V2, P2
[4]  
[Anonymous], 2012, PRINCIPLES EPIDEMIOL
[5]  
[Anonymous], 2018, WEAVEWORKS BUILDING, P1
[6]  
Baena-Garcia Manuel, 2006, Early drift detection method
[7]   Big Data for Infectious Disease Surveillance and Modeling [J].
Bansal, Shweta ;
Chowell, Gerardo ;
Simonsen, Lone ;
Vespignani, Alessandro ;
Viboud, Cecile .
JOURNAL OF INFECTIOUS DISEASES, 2016, 214 :S375-S379
[8]   An integrated dataset of malaria notifications in the Legal Amazon [J].
Baroni, Lais ;
Pedroso, Marcel ;
Barcellos, Christovam ;
Salles, Rebecca ;
Salles, Samella ;
Paixao, Balthazar ;
Chrispino, Alvaro ;
Guedes, Gustavo ;
Ogasawara, Eduardo .
BMC RESEARCH NOTES, 2020, 13 (01)
[9]   Multivariate outlier detection based on a robust Mahalanobis distance with shrinkage estimators [J].
Cabana, Elisa ;
Lillo, Rosa E. ;
Laniado, Henry .
STATISTICAL PAPERS, 2021, 62 (04) :1583-1609
[10]  
Chandu D.P., 2015, Int. J. Comput. Appl, V125, P19