SeedsGraph: an efficient assembler for next-generation sequencing data

被引：2

作者：

Wang, Chunyu ^{[1
]}

Guo, Maozu ^{[1
]}

Liu, Xiaoyan ^{[1
]}

Liu, Yang ^{[1
]}

Zou, Quan ^{[2
]}

机构：

[1] Harbin Inst Technol, Sch Comp Sci & Technol, 92 West Dazhi St, Harbin 150001, Peoples R China

[2] Xiamen Univ, Dept Comp Sci, Xiamen 361005, Peoples R China

来源：

BMC MEDICAL GENOMICS | 2015年 / 8卷

基金：

中国国家自然科学基金; 高等学校博士学科点专项科研基金;

关键词：

ALGORITHMS; GENOMES;

D O I：

10.1186/1755-8794-8-S2-S13

中图分类号：

Q3 [遗传学];

学科分类号：

071007 ; 090102 ;

摘要：

DNA sequencing technology has been rapidly evolving, and produces a large number of short reads with a fast rising tendency. This has led to a resurgence of research in whole genome shotgun assembly algorithms. We start the assembly algorithm by clustering the short reads in a cloud computing framework, and the clustering process groups fragments according to their original consensus long-sequence similarity. We condense each group of reads to a chain of seeds, which is a kind of substring with reads aligned, and then build a graph accordingly. Finally, we analyze the graph to find Euler paths, and assemble the reads related in the paths into contigs, and then lay out contigs with mate-pair information for scaffolds. The result shows that our algorithm is efficient and feasible for a large set of reads such as in next-generation sequencing technology.

引用

页数：9

共 18 条

[1] Cloud computing [J].

Bateman, Alex ;

Wood, Matt .

BIOINFORMATICS, 2009, 25 (12) :1475-1475

[2]

Batzoglou S, 2002, GENOME RES, V12, P177, DOI 10.1101/gr.208902

[3] Short read fragment assembly of bacterial genomes [J].

Chaisson, Mark J. ;

Pevzner, Pavel A. .

GENOME RESEARCH, 2008, 18 (02) :324-330

[4] Mapreduce: Simplified data processing on large clusters [J].

Dean, Jeffrey ;

Ghemawat, Sanjay .

COMMUNICATIONS OF THE ACM, 2008, 51 (01) :107-113

[5]

Ghemawat S, 2003, ACM SIGOPS Operating Systems Review, P29, DOI [10.1145/1165389.945450, 10.1145/945445.945450]

[6] Readjoiner: a fast and memory efficient string graph-based sequence assembler [J].

Gonnella, Giorgio ;

Kurtz, Stefan .

BMC BIOINFORMATICS, 2012, 13

[7] A new method to compute K-mer frequencies and its application to annotate large repetitive plant genomes [J].

Kurtz, Stefan ;

Narechania, Apurva ;

Stein, Joshua C. ;

Ware, Doreen .

BMC GENOMICS, 2008, 9 (1) :517

[8] The Sequence Read Archive [J].

Leinonen, Rasko ;

Sugawara, Hideaki ;

Shumway, Martin .

NUCLEIC ACIDS RESEARCH, 2011, 39 :D19-D21

[9] Genome sequencing in microfabricated high-density picolitre reactors [J].

Margulies, M ;

Egholm, M ;

Altman, WE ;

Attiya, S ;

Bader, JS ;

Bemben, LA ;

Berka, J ;

Braverman, MS ;

Chen, YJ ;

Chen, ZT ;

Dewell, SB ;

Du, L ;

Fierro, JM ;

Gomes, XV ;

Godwin, BC ;

He, W ;

Helgesen, S ;

Ho, CH ;

Irzyk, GP ;

Jando, SC ;

Alenquer, MLI ;

Jarvie, TP ;

Jirage, KB ;

Kim, JB ;

Knight, JR ;

Lanza, JR ;

Leamon, JH ;

Lefkowitz, SM ;

Lei, M ;

Li, J ;

Lohman, KL ;

Lu, H ;

Makhijani, VB ;

McDade, KE ;

McKenna, MP ;

Myers, EW ;

Nickerson, E ;

Nobile, JR ;

Plant, R ;

Puc, BP ;

Ronan, MT ;

Roth, GT ;

Sarkis, GJ ;

Simons, JF ;

Simpson, JW ;

Srinivasan, M ;

Tartaro, KR ;

Tomasz, A ;

Vogt, KA ;

Volkmer, GA .

NATURE, 2005, 437 (7057) :376-380

[10] Assembly algorithms for next-generation sequencing data [J].

Miller, Jason R. ;

Koren, Sergey ;

Sutton, Granger .

GENOMICS, 2010, 95 (06) :315-327

← 1 2 →