Long-Read Annotation: Automated Eukaryotic Genome Annotation Based on Long-Read cDNA Sequencing

被引:36
作者
Cook, David E. [1 ,3 ]
Valle-Inclan, Jose Espejo [1 ,4 ]
Pajoro, Alice [2 ,5 ]
Rovenich, Hanna [1 ,6 ]
Thomma, Bart P. H. J. [1 ]
Faino, Luigi [1 ,7 ]
机构
[1] Wageningen Univ & Res, Lab Phytopathol, Droevendaalsesteeg 1, NL-6708 PB Wageningen, Netherlands
[2] Wageningen Univ & Res, Lab Mol Biol, Droevendaalsesteeg 1, NL-6708 PB Wageningen, Netherlands
[3] Kansas State Univ, Dept Plant Pathol, Manhattan, KS 66056 USA
[4] Univ Utrecht, Univ Med Ctr Utrecht, Ctr Mol Med, Dept Genet, NL-3584 CX Utrecht, Netherlands
[5] Max Planck Inst Plant Breeding Res, Dept Plant Dev Biol, D-50829 Cologne, Germany
[6] Univ Cologne, Bot Inst, Cluster Excellence Plant Sci CEPLAS, D-50674 Cologne, Germany
[7] Univ Roma La Sapienza, Dept Environm Biol, Ple Aldo Moro 5, I-00185 Rome, Italy
关键词
ARABIDOPSIS INFORMATION RESOURCE; RNA-SEQ; PROVIDES INSIGHTS; GENE PREDICTION; TRANSCRIPTOME; CHROMOSOME; EVOLUTION; ORTHOMCL; TOOL;
D O I
10.1104/pp.18.00848
中图分类号
Q94 [植物学];
学科分类号
071001 ;
摘要
Single-molecule full-length complementary DNA (cDNA) sequencing can aid genome annotation by revealing transcript structure and alternative splice forms, yet current annotation pipelines do not incorporate such information. Here we present long-read annotation (LoReAn) software, an automated annotation pipeline utilizing short-and long-read cDNA sequencing, protein evidence, and ab initio prediction to generate accurate genome annotations. Based on annotations of two fungal genomes (Verticillium dahliae and Plicaturopsis crispa) and two plant genomes (Arabidopsis [Arabidopsis thaliana] and Oryza sativa), we show that LoReAn outperforms popular annotation pipelines by integrating single-molecule cDNA-sequencing data generated from either the Pacific Biosciences or MinION sequencing platforms, correctly predicting gene structure, and capturing genes missed by other annotation pipelines.
引用
收藏
页码:38 / 54
页数:17
相关论文
共 62 条
[1]   A survey of the sorghum transcriptome using single-molecule long reads [J].
Abdel-Ghany, Salah E. ;
Hamilton, Michael ;
Jacobi, Jennifer L. ;
Ngam, Peter ;
Devitt, Nicholas ;
Schilkey, Faye ;
Ben-Hur, Asa ;
Reddy, Anireddy S. N. .
NATURE COMMUNICATIONS, 2016, 7
[2]   The African coelacanth genome provides insights into tetrapod evolution [J].
Amemiya, Chris T. ;
Alfoeldi, Jessica ;
Lee, Alison P. ;
Fan, Shaohua ;
Philippe, Herve ;
MacCallum, Iain ;
Braasch, Ingo ;
Manousaki, Tereza ;
Schneider, Igor ;
Rohner, Nicolas ;
Organ, Chris ;
Chalopin, Domitille ;
Smith, Jeramiah J. ;
Robinson, Mark ;
Dorrington, Rosemary A. ;
Gerdol, Marco ;
Aken, Bronwen ;
Biscotti, Maria Assunta ;
Barucca, Marco ;
Baurain, Denis ;
Berlin, Aaron M. ;
Blatch, Gregory L. ;
Buonocore, Francesco ;
Burmester, Thorsten ;
Campbell, Michael S. ;
Canapa, Adriana ;
Cannon, John P. ;
Christoffels, Alan ;
De Moro, Gianluca ;
Edkins, Adrienne L. ;
Fan, Lin ;
Fausto, Anna Maria ;
Feiner, Nathalie ;
Forconi, Mariko ;
Gamieldien, Junaid ;
Gnerre, Sante ;
Gnirke, Andreas ;
Goldstone, Jared V. ;
Haerty, Wilfried ;
Hahn, Mark E. ;
Hesse, Uljana ;
Hoffmann, Steve ;
Johnson, Jeremy ;
Karchner, Sibel I. ;
Kuraku, Shigehiro ;
Lara, Marcia ;
Levin, Joshua Z. ;
Litman, Gary W. ;
Mauceli, Evan ;
Miyake, Tsutomu .
NATURE, 2013, 496 (7445) :311-316
[3]  
[Anonymous], 2016, AGRICOLAE STAT PROCE
[4]   Characterization of the human ESC transcriptome by hybrid sequencing [J].
Au, Kin Fai ;
Sebastiano, Vittorio ;
Afshar, Pegah Tootoonchi ;
Durruthy, Jens Durruthy ;
Lee, Lawrence ;
Williams, Brian A. ;
van Bakel, Harm ;
Schadt, Eric E. ;
Reijo-Pera, Renee A. ;
Underwood, Jason G. ;
Wong, Wing Hung .
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 2013, 110 (50) :E4821-E4830
[5]   The arabidopsis information resource: Making and mining the "gold standard" annotated reference plant genome [J].
Berardini, Tanya Z. ;
Reiser, Leonore ;
Li, Donghui ;
Mezheritsky, Yarik ;
Muller, Robert ;
Strait, Emily ;
Huala, Eva .
GENESIS, 2015, 53 (08) :474-485
[6]   MAKER: An easy-to-use annotation pipeline designed for emerging model organism genomes [J].
Cantarel, Brandi L. ;
Korf, Ian ;
Robb, Sofia M. C. ;
Parra, Genis ;
Ross, Eric ;
Moore, Barry ;
Holt, Carson ;
Alvarado, Alejandro Sanchez ;
Yandell, Mark .
GENOME RESEARCH, 2008, 18 (01) :188-196
[7]   Seqping: gene prediction pipeline for plant genomes using self-training gene models and transcriptomic data [J].
Chan, Kuang-Lim ;
Rosli, Rozana ;
Tatarinova, Tatiana V. ;
Hogan, Michael ;
Firdaus-Raih, Mohd ;
Low, Eng-Ti Leslie .
BMC BIOINFORMATICS, 2017, 18 :1-7
[8]   OrthoMCL-DB: querying a comprehensive multi-species collection of ortholog groups [J].
Chen, Feng ;
Mackey, Aaron J. ;
Stoeckert, Christian J., Jr. ;
Roos, David S. .
NUCLEIC ACIDS RESEARCH, 2006, 34 :D363-D368
[9]  
Chin CS, 2016, NAT METHODS, V13, P1050, DOI [10.1038/nmeth.4035, 10.1038/NMETH.4035]
[10]   Understanding Plant Immunity as a Surveillance System to Detect Invasion [J].
Cook, David E. ;
Mesarich, Carl H. ;
Thomma, Bart P. H. J. .
ANNUAL REVIEW OF PHYTOPATHOLOGY, VOL 53, 2015, 53 :541-563