A Lightweight I/O Scheme to Facilitate Spatial and Temporal Queries of Scientific Data Analytics

被引:0
作者
Tian, Yuan [1 ,5 ]
Liu, Zhuo [1 ]
Klasky, Scott [2 ]
Wang, Bin [1 ]
Abbasi, Hasan [2 ]
Zhou, Shujia [3 ,4 ]
Podhorszki, Norbert [2 ]
Clune, Tom [3 ]
Logan, Jeremy [5 ]
Yu, Weikuan [1 ]
机构
[1] Auburn Univ, Auburn, AL 36849 USA
[2] Oak Ridge Natl Lab, Oak Ridge, TN USA
[3] Goddard Space Flight Ctr, Greenbelt, MD USA
[4] Northrop Grumman Corp, Falls Church, VA USA
[5] Univ Tennessee Knoxville, Knoxville, TN USA
来源
2013 IEEE 29TH SYMPOSIUM ON MASS STORAGE SYSTEMS AND TECHNOLOGIES (MSST) | 2013年
关键词
D O I
暂无
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
In the era of petascale computing, more scientific applications are being deployed on leadership scale computing platforms to enhance the scientific productivity. Many I/O techniques have been designed to address the growing I/O bottleneck on large-scale systems by handling massive scientific data in a holistic manner. While such techniques have been leveraged in a wide range of applications, they have not been shown as adequate for many mission critical applications, particularly in data post-processing stage. One of the examples is that some scientific applications generate datasets composed of a vast amount of small data elements that are organized along many spatial and temporal dimensions but require sophisticated data analytics on one or more dimensions. Including such dimensional knowledge into data organization can be beneficial to the efficiency of data post-processing, which is often missing from exiting I/O techniques. In this study, we propose a novel I/O scheme named STAR ( Spatial and Temporal AggRegation) to enable high performance data queries for scientific analytics. STAR is able to dive into the massive data, identify the spatial and temporal relationships among data variables, and accordingly organize them into an optimized multi-dimensional data structure before storing to the storage. This technique not only facilitates the common access patterns of data analytics, but also further reduces the application turnaround time. In particular, STAR is able to enable efficient data queries along the time dimension, a practice common in scientific analytics but not yet supported by existing I/O techniques. In our case study with a critical climate modeling application GEOS-5, the experimental results on Jaguar supercomputer demonstrate an improvement up to 73 times for the read performance compared to the original I/O method.
引用
收藏
页数:10
相关论文
共 31 条
  • [1] [Anonymous], HPDC 09
  • [2] Bent John., 2009, High Performance Computing Networking, Storage and Analysis, Proceedings of the Conference on, P1
  • [3] Toward a first-principles integrated simulation of tokamak edge plasmas
    Chang, C. S.
    Klasky, S.
    Cummings, J.
    Samtaney, R.
    Shoshani, A.
    Sugiyama, L.
    Keyes, D.
    Ku, S.
    Park, G.
    Parker, S.
    Podhorszki, N.
    Strauss, H.
    Abbasi, H.
    Adams, M.
    Barreto, R.
    Bateman, G.
    Bennett, K.
    Chen, Y.
    Azevedo, E. D'
    Docan, C.
    Ethier, S.
    Feibush, E.
    Greengard, L.
    Hahm, T.
    Hinton, F.
    Jin, C.
    Khan, A.
    Kritz, A.
    Krsti, P.
    Lao, T.
    Lee, W.
    Lin, Z.
    Lofstead, J.
    Mouallem, P.
    Nagappan, M.
    Pankin, A.
    Parashar, M.
    Pindzola, M.
    Reinhold, C.
    Schultz, D.
    Schwan, K.
    Silver, D.
    Sim, A.
    Stotler, D.
    Vouk, M.
    Wolf, M.
    Weitzner, H.
    Worley, P.
    Xiao, Y.
    Yoon, E.
    [J]. SCIDAC 2008: SCIENTIFIC DISCOVERY THROUGH ADVANCED COMPUTING, 2008, 125
  • [4] Chen J. H., 2009, Computational Science and Discovery, V2, DOI 10.1088/1749-4699/2/1/015001
  • [5] Extreme Scaling of Production Visualization Software on Diverse Architectures
    Childs, Hank
    Pugmire, David
    Ahern, Sean
    Whitlock, Brad
    Howison, Mark
    Prabhat
    Weber, Gunther H.
    Bethel, E. Wes
    [J]. IEEE COMPUTER GRAPHICS AND APPLICATIONS, 2010, 30 (03) : 22 - 31
  • [6] Choi J., MINING HIDDEN MIXTUR
  • [7] Scalable I/O and analytics
    Choudhary, Alok
    Liao, Wei-keng
    Gao, Kui
    Nisar, Arifa
    Ross, Robert
    Thakur, Rajeev
    Latham, Robert
    [J]. SCIDAC 2009: SCIENTIFIC DISCOVERY THROUGH ADVANCED COMPUTING, 2009, 180
  • [8] Cong Fan, 1994, Proceedings Eighth International Parallel Processing Symposium (Cat. No.94TH0652-8), P128, DOI 10.1109/IPPS.1994.288310
  • [9] Deshpande P. M., 1998, SIGMOD Record, V27, P259, DOI 10.1145/276305.276328
  • [10] Improving collective I/O performance using threads
    Dickens, PM
    Thakur, R
    [J]. IPPS/SPDP 1999: 13TH INTERNATIONAL PARALLEL PROCESSING SYMPOSIUM & 10TH SYMPOSIUM ON PARALLEL AND DISTRIBUTED PROCESSING, PROCEEDINGS, 1999, : 38 - 45