Bwasw-Cloud: Efficient Sequence Alignment Algorithm for Two Big Data with MapReduce

被引:0
作者
Sun, Mingming [1 ]
Zhou, Xuehai [1 ]
Yang, Feng [1 ]
Lu, Kun [1 ]
Dai, Dong [2 ]
机构
[1] Univ Sci & Technol China, Comp Sci, Hefei 230026, Peoples R China
[2] Texas Tech Univ, Comp Sci, Lubbock, TX 79409 USA
来源
2014 FIFTH INTERNATIONAL CONFERENCE ON THE APPLICATIONS OF DIGITAL INFORMATION AND WEB TECHNOLOGIES (ICADIWT) | 2014年
基金
中国博士后科学基金; 美国国家科学基金会;
关键词
D O I
暂无
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
The recent next-generation sequencing machines generate sequences at an unprecedented rate, and a sequence is not short any more called read. The reference sequences which are aligned reads against are also increasingly large. Efficiently mapping large number of long sequences with big reference sequences poses a new challenge to sequence alignment. Sequence alignment algorithms become to match on two big data. To address the above problem, we propose a new parallel sequence alignment algorithm called Bwasw-Cloud, optimized for aligning long reads against a large sequence data (e.g. the human genome). It is modeled after the widely used BWA-SW algorithm and uses the open-source Hadoop implementation of Map Reduce. The results show that Bwasw-Cloud can effectively and quickly match two big data in common cluster.
引用
收藏
页码:213 / 218
页数:6
相关论文
共 11 条
  • [1] A View of Cloud Computing
    Armbrust, Michael
    Fox, Armando
    Griffith, Rean
    Joseph, Anthony D.
    Katz, Randy
    Konwinski, Andy
    Lee, Gunho
    Patterson, David
    Rabkin, Ariel
    Stoica, Ion
    Zaharia, Matei
    [J]. COMMUNICATIONS OF THE ACM, 2010, 53 (04) : 50 - 58
  • [2] DAI D, 2012, CLUST COMP CLUSTER 2, P601
  • [3] Jeffrey D., COMMUNICATION ACM, V51, P107
  • [4] Kent WJ, 2002, GENOME RES, V12, P656, DOI [10.1101/gr.229202, 10.1101/gr.229202. Article published online before March 2002]
  • [5] Searching for SNPs with cloud computing
    Langmead, Ben
    Schatz, Michael C.
    Lin, Jimmy
    Pop, Mihai
    Salzberg, Steven L.
    [J]. GENOME BIOLOGY, 2009, 10 (11):
  • [6] Fast and accurate long-read alignment with Burrows-Wheeler transform
    Li, Heng
    Durbin, Richard
    [J]. BIOINFORMATICS, 2010, 26 (05) : 589 - 595
  • [7] Meek C., 2003, INT C VERY LARGE DAT, P910, DOI DOI 10.1016/B978-012722442-8/50085-9
  • [8] CloudAligner: A fast and full-featured MapReduce based tool for sequence mapping
    Nguyen T.
    Shi W.
    Ruden D.
    [J]. BMC Research Notes, 4 (1)
  • [9] SSAHA: A fast search method for large DNA databases
    Ning, ZM
    Cox, AJ
    Mullikin, JC
    [J]. GENOME RESEARCH, 2001, 11 (10) : 1725 - 1729
  • [10] CloudBurst: highly sensitive read mapping with MapReduce
    Schatz, Michael C.
    [J]. BIOINFORMATICS, 2009, 25 (11) : 1363 - 1369