Scalable Aggregation Service for Satellite Remote Sensing Data

被引:3
作者
Wang, Jianwu [1 ]
Huang, Xin [1 ]
Zheng, Jianyu [2 ]
Rajapakshe, Chamara [2 ]
Kay, Savio [1 ]
Kandoor, Lakshmi [1 ]
Maxwell, Thomas [3 ]
Zhang, Zhibo [2 ]
机构
[1] Univ Maryland Baltimore Cty, Dept Informat Syst, Baltimore, MD 21250 USA
[2] Univ Maryland Baltimore Cty, Dept Phys, Baltimore, MD 21250 USA
[3] NASA, Goddard Space Flight Ctr, Code 916, Greenbelt, MD 20771 USA
来源
ALGORITHMS AND ARCHITECTURES FOR PARALLEL PROCESSING, ICA3PP 2020, PT II | 2020年 / 12453卷
关键词
Big data; Data aggregation; Remote sensing; Servicelization; Benchmark;
D O I
10.1007/978-3-030-60239-0_13
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
With the advances of satellite remote sensing techniques, we are receiving huge amount of satellite observation data for the Earth. While the data greatly helps Earth scientists on their research, conducting data processing and analytics from the data is getting more and more time consuming and complicated. One common data processing task is to aggregate satellite observation data from original pixel level to latitude-longitude grid level to easily obtain global information and work with global climate models. This paper focuses on how to best aggregate NASA MODIS satellite data products from pixel level to grid level in a distributed environment and provision the aggregation capability as a service for Earth scientists to use easily. We propose three different approaches of parallel data aggregation and employ three parallel platforms (Spark, Dask and MPI) to implement the approaches. We run extensive experiments based on these parallel approaches and platforms on a local cluster to benchmark their differences in execution performance and discuss key factors to achieve good speedup. We also study how to make the provisioned service adaptable to different service libraries and protocols via a unified framework.
引用
收藏
页码:184 / 199
页数:16
相关论文
共 18 条
[1]  
[Anonymous], 2020, DASK SCALABLE ANAL P
[2]  
[Anonymous], SCALABLE MODIS DATA
[3]  
Barajas Carlos, 2019, Benchmarking, Measuring, and Optimizing. First BenchCouncil International Symposium, Bench 2018. Revised Selected Papers. Lecture Notes in Computer Science (LNCS 11459), P248, DOI 10.1007/978-3-030-32813-9_20
[4]  
Chambers B., 2018, SPARK DEFINITIVE GUI
[5]  
Dean J, 2004, USENIX ASSOCIATION PROCEEDINGS OF THE SIXTH SYMPOSIUM ON OPERATING SYSTEMS DESIGN AND IMPLEMENTATION (OSDE '04), P137
[6]   A performance comparison of Dask and Apache Spark for data-intensive neuroimaging pipelines [J].
Dugre, Mathieu ;
Hayot-Sasson, Valerie ;
Glatard, Tristan .
PROCEEDINGS OF WORKS19: THE 2019 14TH IEEE/ACM WORKFLOWS IN SUPPORT OF LARGE-SCALE SCIENCE (WORKS), 2019, :40-49
[7]   Overview of the Coupled Model Intercomparison Project Phase 6 (CMIP6) experimental design and organization [J].
Eyring, Veronika ;
Bony, Sandrine ;
Meehl, Gerald A. ;
Senior, Catherine A. ;
Stevens, Bjorn ;
Stouffer, Ronald J. ;
Taylor, Karl E. .
GEOSCIENTIFIC MODEL DEVELOPMENT, 2016, 9 (05) :1937-1958
[8]  
Nguyen MH, 2019, IEEE INT CONF BIG DA, P5437, DOI [10.1109/bigdata47090.2019.9006205, 10.1109/BigData47090.2019.9006205]
[9]   Climate Data Challenges in the 21st Century [J].
Overpeck, Jonathan T. ;
Meehl, Gerald A. ;
Bony, Sandrine ;
Easterling, David R. .
SCIENCE, 2011, 331 (6018) :700-702
[10]  
Pacheco P., 1997, PARALLEL PROGRAMMING