Parallel Framework for Dimensionality Reduction of Large-Scale Datasets

被引:5
作者
Samudrala, Sai Kiranmayee [1 ]
Zola, Jaroslaw [2 ,3 ]
Aluru, Srinivas [4 ]
Ganapathysubramanian, Baskar [5 ]
机构
[1] Georgia Inst Technol, Dept Mech Engn, Atlanta, GA 30080 USA
[2] SUNY Buffalo, Dept Comp Sci & Engn, Buffalo, NY 14620 USA
[3] SUNY Buffalo, Dept Biomed Informat, Buffalo, NY 14620 USA
[4] Georgia Inst Technol, Sch Computat Sci & Engn, Atlanta, GA 30332 USA
[5] Iowa State Univ, Dept Mech Engn, Ames, IA 50011 USA
基金
美国国家科学基金会;
关键词
LINEAR ALGEBRA; SOLAR-CELLS; MORPHOLOGY; ALGORITHMS; EIGENMAPS; SOFTWARE; DESIGN; FIT;
D O I
10.1155/2015/180214
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
Dimensionality reduction refers to a set of mathematical techniques used to reduce complexity of the original high-dimensional data, while preserving its selected properties. Improvements in simulation strategies and experimental data collection methods are resulting in a deluge of heterogeneous and high-dimensional data, which often makes dimensionality reduction the only viable way to gain qualitative and quantitative understanding of the data. However, existing dimensionality reduction software often does not scale to datasets arising in real-life applications, which may consist of thousands of points with millions of dimensions. In this paper, we propose a parallel framework for dimensionality reduction of large-scale data. We identify key components underlying the spectral dimensionality reduction techniques, and propose their efficient parallel implementation. We show that the resulting framework can be used to process datasets consisting of millions of points when executed on a 16,000-core cluster, which is beyond the reach of currently available methods. To further demonstrate applicability of our framework we perform dimensionality reduction of 75,000 images representing morphology evolution during manufacturing of organic solar cells in order to identify how processing parameters affect morphology evolution.
引用
收藏
页数:12
相关论文
共 69 条
[31]  
Fang Zhong, 2008, 2008 Canadian Conference on Electrical and Computer Engineering - CCECE, P001341, DOI 10.1109/CCECE.2008.4564758
[32]   Thermal comparison between ceiling diffusers and fabric ductwork diffusers for green buildings [J].
Fontanini, Anthony ;
Olsen, Michael G. ;
Ganapathysubramanian, Baskar .
ENERGY AND BUILDINGS, 2011, 43 (11) :2973-2987
[33]  
Golub G. H., 1996, MATRIX COMPUTATIONS
[34]   Efficient parallel algorithms and software for compressed octrees with applications to hierarchical methods [J].
Hariharan, B ;
Aluru, S .
PARALLEL COMPUTING, 2005, 31 (3-4) :311-331
[35]   PARALLEL MANY-BODY SIMULATIONS WITHOUT ALL-TO-ALL COMMUNICATION [J].
HENDRICKSON, B ;
PLIMPTON, S .
JOURNAL OF PARALLEL AND DISTRIBUTED COMPUTING, 1995, 27 (01) :15-25
[36]   SLEPc: A scalable and flexible toolkit for the solution of eigenvalue problems [J].
Hernandez, V ;
Roman, JE ;
Vidal, V .
ACM TRANSACTIONS ON MATHEMATICAL SOFTWARE, 2005, 31 (03) :351-362
[37]   Morphology of polymer/fullerene bulk heterojunction solar cells [J].
Hoppe, H ;
Sariciftci, NS .
JOURNAL OF MATERIALS CHEMISTRY, 2006, 16 (01) :45-61
[38]   ANOTHER ADAPTIVE DISTRIBUTED SHORTEST-PATH ALGORITHM [J].
HUMBLET, PA .
IEEE TRANSACTIONS ON COMMUNICATIONS, 1991, 39 (06) :995-1003
[39]   Glimmer: Multilevel MDS on the GPU [J].
Ingram, Stephen ;
Munzne, Tamara ;
Olano, Marc .
IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS, 2009, 15 (02) :249-261
[40]  
Jenq J.-F., 1987, Proceedings of the 1987 International Conference on Parallel Processing, P713