Efficient and Portable Distribution Modeling for Large-Scale Scientific Data Processing with Data-Parallel Primitives

被引:1
作者
Yang, Hao-Yi [1 ]
Lin, Zhi-Rong [1 ]
Wang, Ko-Chih [1 ]
机构
[1] Natl Taiwan Normal Univ, Dept Comp Sci & Informat Engn, Taipei 11677, Taiwan
关键词
large-scale data processing; scientific dataset; distribution-based approach; parallel algorithm; data-parallel primitive; VISUALIZATION;
D O I
10.3390/a14100285
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The use of distribution-based data representation to handle large-scale scientific datasets is a promising approach. Distribution-based approaches often transform a scientific dataset into many distributions, each of which is calculated from a small number of samples. Most of the proposed parallel algorithms focus on modeling single distributions from many input samples efficiently, but these may not fit the large-scale scientific data processing scenario because they cannot utilize computing resources effectively. Histograms and the Gaussian Mixture Model (GMM) are the most popular distribution representations used to model scientific datasets. Therefore, we propose the use of multi-set histogram and GMM modeling algorithms for the scenario of large-scale scientific data processing. Our algorithms are developed by data-parallel primitives to achieve portability across different hardware architectures. We evaluate the performance of the proposed algorithms in detail and demonstrate use cases for scientific data processing.</p>
引用
收藏
页数:25
相关论文
共 39 条
[1]  
[Anonymous], 1990, Vector models for data-parallel computing
[2]   Parallel Tensor Compression for Large-Scale Scientific Data [J].
Austin, Woody ;
Ballard, Grey ;
Kolda, Tamara G. .
2016 IEEE 30TH INTERNATIONAL PARALLEL AND DISTRIBUTED PROCESSING SYMPOSIUM (IPDPS 2016), 2016, :912-922
[3]   Extending machine learning classification capabilities with histogram reweighting [J].
Bachtis, Dimitrios ;
Aarts, Gert ;
Lucini, Biagio .
PHYSICAL REVIEW E, 2020, 102 (03)
[4]  
Bell N., 2012, Applications of GPU Computing Series, P359, DOI [DOI 10.1016/B978-0-12-385963-1.00026-5, 10.1016/B978-0-12-385963-1.00026-5]
[5]   A Study of Color Histogram Based Image Retrieval [J].
Chakravarti, Rishav ;
Meng, Xiannong .
PROCEEDINGS OF THE 2009 SIXTH INTERNATIONAL CONFERENCE ON INFORMATION TECHNOLOGY: NEW GENERATIONS, VOLS 1-3, 2009, :1323-1328
[6]  
Chaudhuri Abon, 2013, 2013 IEEE Symposium on Large-Scale Data Analysis and Visualization (LDAV), P125, DOI 10.1109/LDAV.2013.6675171
[7]   Efficient Range Distribution Query for Visualizing Scientific Data [J].
Chaudhuri, Abon ;
Wei, Tzu-Hsuan ;
Lee, Teng-Yok ;
Shen, Han-Wei ;
Peterka, Tom .
2014 IEEE PACIFIC VISUALIZATION SYMPOSIUM (PACIFICVIS), 2014, :201-208
[8]  
Chen CM, 2015, IEEE PAC VIS SYMP, P215, DOI 10.1109/PACIFICVIS.2015.7156380
[9]   In Situ Prediction Driven Feature Analysis in Jet Engine Simulations [J].
Dutta, Soumya ;
Shen, Han-Wei ;
Chen, Jen-Ping .
2018 IEEE PACIFIC VISUALIZATION SYMPOSIUM (PACIFICVIS), 2018, :66-75
[10]   In Situ Distribution Guided Analysis and Visualization of Transonic Jet Engine Simulations [J].
Dutta, Soumya ;
Chen, Chun-Ming ;
Heinlein, Gregory ;
Shen, Han-Wei ;
Chen, Jen-Ping .
IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS, 2017, 23 (01) :811-820