SEAL: a distributed short read mapping and duplicate removal tool

被引:85
作者
Pireddu, Luca [1 ]
Leo, Simone [1 ]
Zanetti, Gianluigi [1 ]
机构
[1] Polaris, CRS4, I-09010 Pula, Italy
关键词
ALIGNMENT;
D O I
10.1093/bioinformatics/btr325
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
SEAL is a scalable tool for short read pair mapping and duplicate removal. It computes mappings that are consistent with those produced by BWA and removes duplicates according to the same criteria employed by Picard MarkDuplicates. On a 16-node Hadoop cluster, it is capable of processing about 13GB per hour in map+rmdup mode, while reaching a throughput of 19GB per hour in mapping-only mode.
引用
收藏
页码:2159 / 2160
页数:2
相关论文
共 11 条
[1]   A map of human genome variation from population-scale sequencing [J].
Altshuler, David ;
Durbin, Richard M. ;
Abecasis, Goncalo R. ;
Bentley, David R. ;
Chakravarti, Aravinda ;
Clark, Andrew G. ;
Collins, Francis S. ;
De la Vega, Francisco M. ;
Donnelly, Peter ;
Egholm, Michael ;
Flicek, Paul ;
Gabriel, Stacey B. ;
Gibbs, Richard A. ;
Knoppers, Bartha M. ;
Lander, Eric S. ;
Lehrach, Hans ;
Mardis, Elaine R. ;
McVean, Gil A. ;
Nickerson, DebbieA. ;
Peltonen, Leena ;
Schafer, Alan J. ;
Sherry, Stephen T. ;
Wang, Jun ;
Wilson, Richard K. ;
Gibbs, Richard A. ;
Deiros, David ;
Metzker, Mike ;
Muzny, Donna ;
Reid, Jeff ;
Wheeler, David ;
Wang, Jun ;
Li, Jingxiang ;
Jian, Min ;
Li, Guoqing ;
Li, Ruiqiang ;
Liang, Huiqing ;
Tian, Geng ;
Wang, Bo ;
Wang, Jian ;
Wang, Wei ;
Yang, Huanming ;
Zhang, Xiuqing ;
Zheng, Huisong ;
Lander, Eric S. ;
Altshuler, David L. ;
Ambrogio, Lauren ;
Bloom, Toby ;
Cibulskis, Kristian ;
Fennell, Tim J. ;
Gabriel, Stacey B. .
NATURE, 2010, 467 (7319) :1061-1073
[2]  
[Anonymous], 2010, Proceedings of the International Symposium on High Performance Distributed Computing (HPDC '10), DOI [10.1145/1851476.1851594, DOI 10.1145/1851476.1851594]
[3]  
Dean J, 2004, OSDI, P137
[4]   The UCSC Genome Browser database: update 2011 [J].
Fujita, Pauline A. ;
Rhead, Brooke ;
Zweig, Ann S. ;
Hinrichs, Angie S. ;
Karolchik, Donna ;
Cline, Melissa S. ;
Goldman, Mary ;
Barber, Galt P. ;
Clawson, Hiram ;
Coelho, Antonio ;
Diekhans, Mark ;
Dreszer, Timothy R. ;
Giardine, Belinda M. ;
Harte, Rachel A. ;
Hillman-Jackson, Jennifer ;
Hsu, Fan ;
Kirkup, Vanessa ;
Kuhn, Robert M. ;
Learned, Katrina ;
Li, Chin H. ;
Meyer, Laurence R. ;
Pohl, Andy ;
Raney, Brian J. ;
Rosenbloom, Kate R. ;
Smith, Kayla E. ;
Haussler, David ;
Kent, W. James .
NUCLEIC ACIDS RESEARCH, 2011, 39 :D876-D882
[5]  
Illumina Incorporation, 2009, SEQ AN SOFTW US GUID
[6]  
Kozarewa I, 2009, NAT METHODS, V6, P291, DOI [10.1038/NMETH.1311, 10.1038/nmeth.1311]
[7]   Searching for SNPs with cloud computing [J].
Langmead, Ben ;
Schatz, Michael C. ;
Lin, Jimmy ;
Pop, Mihai ;
Salzberg, Steven L. .
GENOME BIOLOGY, 2009, 10 (11)
[8]   Ultrafast and memory-efficient alignment of short DNA sequences to the human genome [J].
Langmead, Ben ;
Trapnell, Cole ;
Pop, Mihai ;
Salzberg, Steven L. .
GENOME BIOLOGY, 2009, 10 (03)
[9]   A survey of sequence alignment algorithms for next-generation sequencing [J].
Li, Heng ;
Homer, Nils .
BRIEFINGS IN BIOINFORMATICS, 2010, 11 (05) :473-483
[10]   Fast and accurate short read alignment with Burrows-Wheeler transform [J].
Li, Heng ;
Durbin, Richard .
BIOINFORMATICS, 2009, 25 (14) :1754-1760