Machine Learning and OLAP on Big COVID-19 Data

被引:36
作者
Leung, Carson K. [1 ]
Chen, Yubo [1 ]
Hoi, Calvin S. H. [1 ]
Shang, Siyuan [1 ]
Cuzzocrea, Alfredo [2 ]
机构
[1] Univ Manitoba, Dept Comp Sci, Winnipeg, MB, Canada
[2] Univ Calabria, Big Data Engn & Analyt Lab, Arcavacata Di Rende, Italy
来源
2020 IEEE INTERNATIONAL CONFERENCE ON BIG DATA (BIG DATA) | 2020年
基金
加拿大自然科学与工程研究理事会;
关键词
big data; machine learning; online analytical processing; OLAP; data science; data analytics; data mining; coronavirus disease; COVID-19; epidemiological data; EDITORIAL SPECIAL-ISSUE; FRAMEWORK; DIAGNOSIS;
D O I
10.1109/BigData50022.2020.9378407
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In the current technological era, huge amounts of big data are generated and collected from a wide variety of rich data sources. These big data can be of different levels of veracity in the sense that some of them are precise while some others are imprecise and uncertain. Embedded in these big data are useful information and valuable knowledge to be discovered. An example of these big data is healthcare and epidemiological data such as data related to patients who suffered from epidemic diseases like the coronavirus disease 2019 (COVID-19). Knowledge discovered from these epidemiological data-via data science techniques such as machine learning, data mining, and online analytical processing (OLAP)-helps researchers, epidemiologists and policy makers to get a better understanding of the disease, which may inspire them to come up ways to detect, control and combat the disease. In this paper, we present a machine learning and big data analytic tool for processing and analyzing COVID-19 epidemiological data. Specifically, the tool makes good use of taxonomy and OLAP to generalize some specific attributes into some generalized attributes for effective big data analytics. Instead of ignoring unknown or unstated values of some attributes, the tool provides users with flexibility of including or excluding these values, depending on their preference and applications. Moreover, the tool discovers frequent patterns and their related patterns, which help reveal some useful knowledge such as absolute and relative frequency of the patterns. Furthermore, the tool learns from the patterns discovered from historical data and predicts useful information such as clinical outcomes for future data. As such, the tool helps users to get a better understanding of information about the confirmed cases of COVID-19. Although this tool is designed for machine learning and analytics of big epidemiological data, it would be applicable to machine learning and analytics of big data in many other real-life applications and services.
引用
收藏
页码:5118 / 5127
页数:10
相关论文
共 50 条
[11]  
Cuzzocrea A., 2012, ISMIS, P455
[12]  
Cuzzocrea A., 2014, P DOLAP 2014, P99
[13]  
Cuzzocrea Alfredo, 2016, P SAC 2016, P992
[14]   Effective privacy preserving data publishing by vectorization [J].
Eom, Chris Soo-Hyun ;
Lee, Charles Cheolgi ;
Lee, Wookey ;
Leung, Carson K. .
INFORMATION SCIENCES, 2020, 527 :311-328
[15]  
Fariha Anna, 2013, Advances in Knowledge Discovery and Data Mining. 17th Pacific-Asia Conference, PAKDD 2013. Proceedings, P38, DOI 10.1007/978-3-642-37453-1_4
[16]  
Gupta Pranjal, 2021, Big Data Analyses, Services, and Smart Data. Advances in Intelligent Systems and Computing (AISC 899), P106, DOI 10.1007/978-981-15-8731-3_8
[17]  
He CH, 2019, IEEE INT CONF BIG DA, P288, DOI [10.1109/bigdata47090.2019.9005513, 10.1109/BigData47090.2019.9005513]
[18]  
Hirai S, 2019, IEEE INT CONF BIG DA, P84, DOI 10.1109/BigData47090.2019.9005617
[19]   Artificial Intelligence and COVID-19: Deep Learning Approaches for Diagnosis and Treatment [J].
Jamshidi, Mohammad Behdad ;
Lalbakhsh, Ali ;
Talla, Jakub ;
Peroutka, Zdenek ;
Hadjilooei, Farimah ;
Lalbakhsh, Pedram ;
Jamshidi, Morteza ;
La Spada, Luigi ;
Mirmozafari, Mirhamed ;
Dehghani, Mojgan ;
Sabet, Asal ;
Roshani, Saeed ;
Roshani, Sobhan ;
Bayat-Makou, Nima ;
Mohamadzade, Bahare ;
Malek, Zahra ;
Jamshidi, Alireza ;
Kiani, Sarah ;
Hashemi-Dezaki, Hamed ;
Mohyuddin, Wahab .
IEEE ACCESS, 2020, 8 :109581-109595
[20]   A Data Analytic Algorithm for Managing, Querying, and Processing Uncertain Big Data in Cloud Environments [J].
Jiang, Fan ;
Leung, Carson K. .
ALGORITHMS, 2015, 8 (04) :1175-1194