Case Study of Scientific Data Processing on a Cloud Using Hadoop

被引:0
|
作者
Zhang, Chen [1 ]
De Sterck, Hans [2 ]
Aboulnaga, Ashraf [1 ]
Djambazian, Haig [3 ]
Sladek, Rob
机构
[1] Univ Waterloo, David R Cheriton Sch Comp Sci, Waterloo, ON N2L 3G1, Canada
[2] Univ Waterloo, Dept Appl Math, Waterloo, ON N2L 3G1, Canada
[3] McGill Univ, Genome Quebec Innovat Ctr, Montreal, PQ H3A 1A4, Canada
来源
HIGH PERFORMANCE COMPUTING SYSTEMS AND APPLICATIONS | 2010年 / 5976卷
关键词
D O I
暂无
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
With the increasing popularity of cloud computing, Hadoop has become a widely used open source cloud computing framework for large scale data processing. However, few efforts have been made to demonstrate the applicability of Hadoop to various real-world application scenarios in fields other than server side computations such as web indexing, etc. In this paper, we use the Hadoop cloud computing framework to develop a user application that allows processing of scientific data on clouds. A simple extension to Hadoop's MapReduce is described which allows it to handle scientific data processing problems with arbitrary input formats and explicit control over how the input is split. This approach is used to develop a Hadoop-based cloud computing application that processes sequences of microscope images of live cells, and we test its performance. It is discussed how the approach can be generalized to more complicated scientific data processing problems.
引用
收藏
页码:400 / +
页数:3
相关论文
共 50 条
  • [31] Processing LIDAR Data with Apache Hadoop
    Ruzicka, Jan
    Orcik, Lukas
    Ruzickova, Katerina
    Kisztner, Juraj
    RISE OF BIG SPATIAL DATA, 2017, : 351 - 358
  • [32] Choosing Optimal Maintenance Time for Stateless Data-Processing Clusters A Case Study of Hadoop Cluster
    Zhuang, Zhenyun
    Shen, Min
    Ramachandra, Haricharan
    Viswesan, Suja
    JOB SCHEDULING STRATEGIES FOR PARALLEL PROCESSING, JSSPP 2016, 2017, 10353 : 252 - 273
  • [33] Traffic Analysis using Hadoop Cloud
    Aishwarya, K.
    Sankar, Sharmila
    2015 INTERNATIONAL CONFERENCE ON INNOVATIONS IN INFORMATION, EMBEDDED AND COMMUNICATION SYSTEMS (ICIIECS), 2015,
  • [34] Impact of Processing and Analyzing Healthcare Big Data on Cloud Computing Environment by Implementing Hadoop Cluster
    Rallapalli, Sreekanth
    Gondkar, R. R.
    Ketavarapu, Uma Pavan Kumar
    INTERNATIONAL CONFERENCE ON COMPUTATIONAL MODELLING AND SECURITY (CMS 2016), 2016, 85 : 16 - 22
  • [35] Moving Hadoop to the Cloud for Big Data Analytics
    Astrova, Irina
    Koschel, Arne
    Heine, Felix
    Kalja, Ahto
    DATABASES AND INFORMATION SYSTEMS X (DB&IS 2018), 2019, 315 : 195 - 209
  • [36] A Hybrid Cloud Computing Approach for Intelligent Processing and Storage of Scientific Data
    Horat, David
    Quevedo, Eduardo
    Quesada-Arencibia, Alexis
    COMPUTER AIDED SYSTEMS THEORY, PT 1, 2013, 8111 : 182 - 188
  • [37] NEAR REAL-TIME PROCESSING OF PROTEOMICS DATA USING HADOOP
    Hillman, Chris
    Ahmad, Yasmeen
    Whitehorn, Mark
    Cobley, Andy
    BIG DATA, 2014, 2 (01) : 44 - 49
  • [38] Distributed processing using cosine similarity for mapping Big Data in Hadoop
    Rojas, A. F.
    Gelvez, N. Y.
    IEEE LATIN AMERICA TRANSACTIONS, 2016, 14 (06) : 2857 - 2861
  • [39] Processing Real World Datasets using Big Data Hadoop Tools
    Deshai, N.
    Sekhar, B. V. D. S.
    Reddy, P. V. G. D. Prasad
    Chakravarthy, V. V. S. S. S.
    JOURNAL OF SCIENTIFIC & INDUSTRIAL RESEARCH, 2020, 79 (07): : 631 - 635
  • [40] Minimizing Big Data Problems using Cloud Computing Based on Hadoop Architecture
    Adnan, Muhammad
    Afzal, Muhammad
    Aslam, Muhammad
    Jan, Roohl
    Martinez-Enriquez, A. M.
    2014 11TH ANNUAL HIGH CAPACITY OPTICAL NETWORKS AND EMERGING/ENABLING TECHNOLOGIES (PHOTONICS FOR ENERGY), 2014, : 99 - 103