Progressive Clustering of Big Data with GPU Acceleration and Visualization

被引:0
作者
Wang, Jun [1 ]
Papenhausen, Eric [1 ]
Wang, Bing [1 ]
Ha, Sungsoo [1 ]
Zelenyuk, Alla [2 ]
Mueller, Klaus [1 ]
机构
[1] SUNY Stony Brook, Dept Comp Sci, Visual Analyt & Imaging Lab, Stony Brook, NY 11794 USA
[2] Pacific Northwest Natl Lab, Chem & Mat Sci Div, Richland, WA USA
来源
2017 NEW YORK SCIENTIFIC DATA SUMMIT (NYSDS) | 2017年
关键词
clustering; big data; GPU; visualization; SOFTWARE; ALGORITHMS;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Clustering has become an unavoidable step in big data analysis. It may be used to arrange data into a compact format, making operations on big data manageable. However, clustering of big data requires not only the capability of handling data with large volume and high dimensionality, but also the ability to process streaming data, all of which are less developed in most current algorithms. Furthermore, big data processing is seldom interactive, which stands at conflict with users who seek answers immediately. The best one can do is to process incrementally, such that partial and, hopefully, accurate results can be available relatively quickly and are then progressively refined over time. We propose a clustering framework which uses Multi-Dimensional Scaling for layout and GPU acceleration to accomplish these goals. Our domain application is the clustering of mass spectral data of individual aerosol particles with 8 million data points of 450 dimensions each.
引用
收藏
页数:9
相关论文
共 30 条
[1]  
Agrawal R., 1998, SIGMOD Record, V27, P94, DOI 10.1145/276305.276314
[2]  
Ankerst M, 1999, SIGMOD RECORD, VOL 28, NO 2 - JUNE 1999, P49
[3]   Community cleverness required [J].
不详 .
NATURE, 2008, 455 (7209) :1-1
[4]  
[Anonymous], 2011, Science, V331, P639
[5]  
[Anonymous], IEEE INF VIS S
[6]  
[Anonymous], 1996, SIGMOD REC ACM SPEC, DOI DOI 10.1145/235968.233324
[7]  
Bai Hong-tao, 2009, 2009 WRI World Congress on Computer Science and Information Engineering (CSIE 2009), P651, DOI 10.1109/CSIE.2009.491
[8]   FCM - THE FUZZY C-MEANS CLUSTERING-ALGORITHM [J].
BEZDEK, JC ;
EHRLICH, R ;
FULL, W .
COMPUTERS & GEOSCIENCES, 1984, 10 (2-3) :191-203
[9]  
BRADSKI G., 2007, NIPS, P281
[10]   Approximate Distributed K-Means Clustering over a Peer-to-Peer Network [J].
Datta, Souptik ;
Giannella, Chris R. ;
Kargupta, Hillol .
IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2009, 21 (10) :1372-1388