Data-intensive science: The Terapixel and MODISAzure projects

被引:11
作者
Agarwal, Deb [2 ]
Cheah, You-Wei [3 ]
Fay, Dan [1 ]
Fay, Jonathan
Guo, Dean
Hey, Tony
Humphrey, Marty [4 ]
Jackson, Keith
Li, Jie [4 ]
Poulain, Christophe
Ryu, Youngryel [5 ]
van Ingen, Catharine [1 ]
机构
[1] Microsoft Res, eScience Grp, Redmond, WA 98052 USA
[2] Lawrence Berkeley Natl Lab, Adv Comp Sci Dept, Head & Data Intens Syst Grp Lead, Berkeley, CA USA
[3] Indiana Univ, Sch Informat & Comp, Bloomington, IN 47405 USA
[4] Univ Virginia, Dept Comp Sci, Charlottesville, VA 22903 USA
[5] Harvard Univ, Dept Organism & Evolutionary Biol, Cambridge, MA 02138 USA
关键词
cloud computing; data-intensive science; massive datasets; MODISAzure; Terapixel;
D O I
10.1177/1094342011414746
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
We live in an era in which scientific discovery is increasingly driven by data exploration of massive datasets. Scientists today are envisioning diverse data analyses and computations that scale from the desktop to supercomputers, yet often have difficulty designing and constructing software architectures to accommodate the heterogeneous and often inconsistent data at scale. Moreover, scientific data and computational resource needs can vary widely over time. The needs grow as the science collaboration broadens or as additional data is accumulated; the computational demand can have large transients in response to seasonal field campaigns or new instrumentation breakthroughs. Cloud computing can offer a scalable, economic, on-demand model that is well matched to some of these evolving science needs. This paper presents two of our experiences over the last year - the Terapixel Project, using workflow, high-performance computing and non-structured query language data processing to render the largest astronomical image for the WorldWide Telescope, and MODISAzure, a science pipeline for image processing, deployed using the Azure Cloud infrastructure.
引用
收藏
页码:304 / 316
页数:13
相关论文
共 13 条
  • [1] [Anonymous], 2009, Microsoft Research
  • [2] [Anonymous], P S OP SYST DES IMPL
  • [3] Breathing of the terrestrial biosphere: lessons learned from a global network of carbon dioxide flux measurement systems
    Baldocchi, Dennis
    [J]. AUSTRALIAN JOURNAL OF BOTANY, 2008, 56 (01) : 1 - 26
  • [4] Beyond the Data Deluge
    Bell, Gordon
    Hey, Tony
    Szalay, Alex
    [J]. SCIENCE, 2009, 323 (5919) : 1297 - 1298
  • [5] An overview of MODIS capabilities for ocean science observations
    Esaias, WE
    Abbott, MR
    Barton, I
    Brown, OB
    Campbell, JW
    Carder, KL
    Clark, DK
    Evans, RH
    Hoge, FE
    Gordon, HR
    Balch, WM
    Letelier, R
    Minnett, PJ
    [J]. IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 1998, 36 (04): : 1250 - 1265
  • [6] The Digitized Second Palomar Observatory Sky Survey (DPOSS). II. Photometric calibration
    Gal, RR
    de Carvalho, RR
    Odewahn, SC
    Djorgovski, SG
    Mahabal, A
    Brunner, RJ
    Gal, RR
    [J]. ASTRONOMICAL JOURNAL, 2004, 128 (06) : 3082 - 3091
  • [7] Overview of the radiometric and biophysical performance of the MODIS vegetation indices
    Huete, A
    Didan, K
    Miura, T
    Rodriguez, EP
    Gao, X
    Ferreira, LG
    [J]. REMOTE SENSING OF ENVIRONMENT, 2002, 83 (1-2) : 195 - 213
  • [8] Isard M., 2007, P EUR C COMP SYST EU
  • [9] JUSTICE C, 1998, IEEE T GEOSCI REMOTE, V36, P1313
  • [10] Metric-Aware Processing of Spherical Imagery
    Kazhdan, Michael
    Hoppe, Hugues
    [J]. ACM TRANSACTIONS ON GRAPHICS, 2010, 29 (06):