Machine Learning and OLAP on Big COVID-19 Data

被引:36
作者
Leung, Carson K. [1 ]
Chen, Yubo [1 ]
Hoi, Calvin S. H. [1 ]
Shang, Siyuan [1 ]
Cuzzocrea, Alfredo [2 ]
机构
[1] Univ Manitoba, Dept Comp Sci, Winnipeg, MB, Canada
[2] Univ Calabria, Big Data Engn & Analyt Lab, Arcavacata Di Rende, Italy
来源
2020 IEEE INTERNATIONAL CONFERENCE ON BIG DATA (BIG DATA) | 2020年
基金
加拿大自然科学与工程研究理事会;
关键词
big data; machine learning; online analytical processing; OLAP; data science; data analytics; data mining; coronavirus disease; COVID-19; epidemiological data; EDITORIAL SPECIAL-ISSUE; FRAMEWORK; DIAGNOSIS;
D O I
10.1109/BigData50022.2020.9378407
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In the current technological era, huge amounts of big data are generated and collected from a wide variety of rich data sources. These big data can be of different levels of veracity in the sense that some of them are precise while some others are imprecise and uncertain. Embedded in these big data are useful information and valuable knowledge to be discovered. An example of these big data is healthcare and epidemiological data such as data related to patients who suffered from epidemic diseases like the coronavirus disease 2019 (COVID-19). Knowledge discovered from these epidemiological data-via data science techniques such as machine learning, data mining, and online analytical processing (OLAP)-helps researchers, epidemiologists and policy makers to get a better understanding of the disease, which may inspire them to come up ways to detect, control and combat the disease. In this paper, we present a machine learning and big data analytic tool for processing and analyzing COVID-19 epidemiological data. Specifically, the tool makes good use of taxonomy and OLAP to generalize some specific attributes into some generalized attributes for effective big data analytics. Instead of ignoring unknown or unstated values of some attributes, the tool provides users with flexibility of including or excluding these values, depending on their preference and applications. Moreover, the tool discovers frequent patterns and their related patterns, which help reveal some useful knowledge such as absolute and relative frequency of the patterns. Furthermore, the tool learns from the patterns discovered from historical data and predicts useful information such as clinical outcomes for future data. As such, the tool helps users to get a better understanding of information about the confirmed cases of COVID-19. Although this tool is designed for machine learning and analytics of big epidemiological data, it would be applicable to machine learning and analytics of big data in many other real-life applications and services.
引用
收藏
页码:5118 / 5127
页数:10
相关论文
共 50 条
[1]  
Ahn S., FUZZ IEEE 2019, P1259
[2]   Role of biological Data Mining and Machine Learning Techniques in Detecting and Diagnosing the Novel Coronavirus (COVID-19): A Systematic Review [J].
Albahri, A. S. ;
Hamid, Rula A. ;
Alwan, Jwan K. ;
Al-qays, Z. T. ;
Zaidan, A. A. ;
Zaidan, B. B. ;
Albahri, A. O. S. ;
AlAmoodi, A. H. ;
Khlaf, Jamal Mawlood ;
Almahdi, E. M. ;
Thabet, Eman ;
Hadi, Suha M. ;
Mohammed, K., I ;
Alsalem, M. A. ;
Al-Obaidi, Jameel R. ;
Madhloom, H. T. .
JOURNAL OF MEDICAL SYSTEMS, 2020, 44 (07)
[3]  
Alim A, 2019, IEEE INT CONF BIG DA, P6, DOI 10.1109/BigData47090.2019.9006319
[4]   Editorial Special Issue on "AI-Driven Informatics, Sensing, Imaging and Big Data Analytics for Fighting the COVID-19 Pandemic" [J].
Amini, Amir A. ;
Chen, Wei ;
Fortino, Giancarlo ;
Li, Ye ;
Pan, Yi ;
Wang, May Dongmei .
IEEE JOURNAL OF BIOMEDICAL AND HEALTH INFORMATICS, 2020, 24 (10) :2731-2732
[5]   Application of deep learning technique to manage COVID-19 in routine clinical practice using CT images: Results of 10 convolutional neural networks [J].
Ardakani, Ali Abbasian ;
Kanafi, Alireza Rajabzadeh ;
Acharya, U. Rajendra ;
Khadem, Nazanin ;
Mohammadi, Afshin .
COMPUTERS IN BIOLOGY AND MEDICINE, 2020, 121 (121)
[6]   Multi-omits-based identification of SARS-CoV-2 infection biology and candidate drugs against COVID-19 [J].
Barh, Debmalya ;
Tiwari, Sandeep ;
Weener, Marianna E. ;
Azevedo, Vasco ;
Goes-Neto, Aristoteles ;
Gromiha, M. Michael ;
Ghosh, Preetam .
COMPUTERS IN BIOLOGY AND MEDICINE, 2020, 126
[7]   A Machine Learning System for Supporting Advanced Knowledge Discovery from Chess Game Data [J].
Brown, James A. ;
Cuzzocrea, Alfredo ;
Kresta, Michael ;
Kristjanson, Korbin D. L. ;
Leung, Carson K. ;
Tebinka, Timothy W. .
2017 16TH IEEE INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND APPLICATIONS (ICMLA), 2017, :649-654
[8]   A new framework for mining weighted periodic patterns in time series databases [J].
Chanda, Ashis Kumar ;
Ahmed, Chowdhury Farhan ;
Samiullah, Md ;
Leung, Carson K. .
EXPERT SYSTEMS WITH APPLICATIONS, 2017, 79 :207-224
[9]   A novel distributed framework for optimizing query routing trees in wireless sensor networks via optimal operator placement [J].
Chatzimilioudis, Georgios ;
Cuzzocrea, Alfredo ;
Gunopulos, Dimitrios ;
Mamoulis, Nikos .
JOURNAL OF COMPUTER AND SYSTEM SCIENCES, 2013, 79 (03) :349-368
[10]  
Cuzzocrea A., 2006, Web Intelligence and Agent Systems, V4, P289