Speeding-up codon analysis on the cloud with local MapReduce aggregation

被引:9
作者
Radenski, Atanas [1 ]
Ehwerhemuepha, Louis [1 ]
机构
[1] Chapman Univ, Sch Computat Sci, Schmid Coll Sci & Technol, Orange, CA 92866 USA
关键词
Codon analysis; Hadoop; MapReduce; Local aggregation; Cloud computing; OPTIMIZATION; FRAMEWORK; EFFICIENT; GENES; DNA;
D O I
10.1016/j.ins.2013.11.028
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
A notable obstacle to higher performance of data-intensive Hadoop MapReduce (MR) bioinformatics algorithms is the large volume of intermediate data that need to be sorted, shuffled, and transmitted between mapper and reducer tasks. This difficulty manifests itself quite clearly in MR codon analysis which is known to generate voluminous intermediate data that create a bottleneck in basic MR codon analysis algorithms. Our proposed approach to handle the intermediate data bottleneck is local in-mapper aggregation (or simply local aggregation), a technique that helps reduce the intermediate data volume between mapper and reducer tasks in MR. We experimentally evaluate the performance of local aggregation (i) by developing codon analysis MR algorithms with and without local aggregation and (ii) by experimentally measuring their performance on Amazon Web Services (AWS), the Amazon cloud platform. Codon analysis with local aggregation maintains consistently high performance with the growth of larger datasets while basic codon analysis, without local aggregation becomes impractically slow even for smaller datasets. Our results can be beneficial (i) to members of the bioinformatics community who need to perform fast and cost-effective nucleotide MR analysis on the cloud and (ii) to computer scientists who strive to increase the performance of MR algorithms. (C) 2013 Elsevier Inc. All rights reserved.
引用
收藏
页码:175 / 185
页数:11
相关论文
共 53 条
  • [11] DETERMINATION OF EUKARYOTIC PROTEIN CODING REGIONS USING NEURAL NETWORKS AND INFORMATION-THEORY
    FARBER, R
    LAPEDES, A
    SIROTKIN, K
    [J]. JOURNAL OF MOLECULAR BIOLOGY, 1992, 226 (02) : 471 - 479
  • [12] Fox G., 2010, CLOUDS AND MAPREDUCE
  • [13] Building a High-Level Dataflow System on top of Map-Reduce: The Pig Experience
    Gates, Alan F.
    Natkovich, Olga
    Chopra, Shubham
    Kamath, Pradeep
    Narayanamurthy, Shravan M.
    Olston, Christopher
    Reed, Benjamin
    Srinivasan, Santhosh
    Srivastava, Utkarsh
    [J]. PROCEEDINGS OF THE VLDB ENDOWMENT, 2009, 2 (02): : 1414 - 1425
  • [14] Ghoting A, 2011, PROC INT CONF DATA, P231, DOI 10.1109/ICDE.2011.5767930
  • [15] Herodotou H, 2011, PROC VLDB ENDOW, V4, P1111
  • [16] Automatic Optimization for MapReduce Programs
    Jahani, Eaman
    Cafarella, Michael J.
    Re, Christopher
    [J]. PROCEEDINGS OF THE VLDB ENDOWMENT, 2011, 4 (06): : 385 - 396
  • [17] Kienzler R, 2012, EURO PAR 2011 PARALL, P467
  • [18] Classifying Proteins into Functional Groups Based on All-versus-All BLAST of 10 Million Proteins
    Kolker, Natali
    Higdon, Roger
    Broomall, William
    Stanberry, Larissa
    Welch, Dean
    Lu, Wei
    Haynes, Winston
    Barga, Roger
    Kolker, Eugene
    [J]. OMICS-A JOURNAL OF INTEGRATIVE BIOLOGY, 2011, 15 (7-8) : 513 - 521
  • [19] Cost-Effective Cloud Computing: A Case Study Using the Comparative Genomics Tool, Roundup
    Kudtarkar, Parul
    DeLuca, Todd F.
    Fusaro, Vincent A.
    Tonellato, Peter J.
    Wall, Dennis P.
    [J]. EVOLUTIONARY BIOINFORMATICS, 2010, 6 : 197 - 203
  • [20] Google's MapReduce programming model -: Revisited
    Laemmel, Ralf
    [J]. SCIENCE OF COMPUTER PROGRAMMING, 2008, 70 (01) : 1 - 30