Clustering and visualization of a high-dimensional diabetes dataset

被引:1
作者
Lasek, Piotr [1 ]
Mei, Zhen [2 ]
机构
[1] Univ Rzeszow, Rzeszow, Poland
[2] Manifold Data Min Inc, Toronto, ON, Canada
来源
KNOWLEDGE-BASED AND INTELLIGENT INFORMATION & ENGINEERING SYSTEMS (KES 2019) | 2019年 / 159卷
关键词
data mining; data visualization; interactive data visualization; diabetes; visualization; clustering; high-dimensional clustering;
D O I
10.1016/j.procs.2019.09.392
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Data clustering algorithms have proved to be important and widely used methods of artificial intelligence and data mining for discovering unknown yet important patterns in datasets. Nevertheless, one of the additional aspects of data clustering is proper interpretation of the clustering results. In this paper we aim to investigate possibilities of using both data clustering and visualization methods to analyze a sample diabetes dataset. In the first part, we focus on how to cluster a highly-dimensional sample dataset and then, we concentrate on how to properly visually present the clustering results in the most meaningful way to uncover potentially interesting behavioral patterns or features of diabetes patients. In this work we examine two clustering algorithms (DBSCAN, k-Means) along with several different distance measures. We also present sample visualizations of clustering results generated by an application which we have developed and discuss if the proposed way of clustering results visualization can be helpful in understanding the analyzed dataset and lead a viewer to drawing valuable conclusions about it. (C) 2019 The Authors. Published by Elsevier B.V. This is an open access article under the CC BY-NC-ND license (http://creativecommons.org/licenses)/by-nc-nd/4.0/) Peer-review under responsibility of KES International.
引用
收藏
页码:2179 / 2188
页数:10
相关论文
共 24 条
[1]   An Image-based Approach to Extreme Scale In Situ Visualization and Analysis [J].
Ahrens, James ;
Jourdain, Sebastien ;
O'Leary, Patrick ;
Patchett, John ;
Rogers, David H. ;
Petersen, Mark .
SC14: INTERNATIONAL CONFERENCE FOR HIGH PERFORMANCE COMPUTING, NETWORKING, STORAGE AND ANALYSIS, 2014, :424-434
[2]  
Anderberg M.R., 1973, Probability and Mathematical Statistics
[3]  
[Anonymous], 2013, THINK OUTS PILLB 6 P
[4]  
[Anonymous], 2003, Adherence to Long-Term Therapies: Evidence for action
[5]  
[Anonymous], 2007, Preventing and managing chronic disease:Ontario's framework
[6]  
[Anonymous], 2013, WORKSHOP P
[7]  
[Anonymous], 2010, CHRON DIS HLTH PROM
[8]  
[Anonymous], 2007, ED YOU MED ENHANCING
[9]   A Survey Of Big Data Analytics in Healthcare and Government [J].
Archenaa, J. ;
Anita, E. A. Mary .
BIG DATA, CLOUD AND COMPUTING CHALLENGES, 2015, 50 :408-413
[10]  
Choi SS., 2010, J SYSTEMICS CYBERNET, V8, P43