BRAKER2: automatic eukaryotic genome annotation with GeneMark-EP plus and AUGUSTUS supported by a protein database

被引:707
作者
Bruna, Tomas [1 ]
Hoff, Katharina J. [2 ,3 ]
Lomsadze, Alexandre [4 ]
Stanke, Mario [2 ,3 ]
Borodovsky, Mark [4 ,5 ]
机构
[1] Georgia Inst Technol, Sch Biol Sci, Atlanta, GA 30332 USA
[2] Univ Greifswald, Inst Math & Comp Sci, D-17489 Greifswald, Germany
[3] Univ Greifswald, Ctr Funct Genom Microbes, D-17489 Greifswald, Germany
[4] Georgia Inst Technol, Wallace H Coulter Dept Biomed Engn, Atlanta, GA 30332 USA
[5] Georgia Inst Technol, Sch Computat Sci & Engn, Atlanta, GA 30332 USA
基金
美国国家卫生研究院;
关键词
STRUCTURE PREDICTION; ALIGNMENT; PIPELINE; INSIGHTS; FINDER; GENES;
D O I
10.1093/nargab/lqaa108
中图分类号
Q3 [遗传学];
学科分类号
071007 ; 090102 ;
摘要
The task of eukaryotic genome annotation remains challenging. Only a few genomes could serve as standards of annotation achieved through a tremendous investment of human curation efforts. Still, the correctness of all alternative isoforms, even in the best-annotated genomes, could be a good subject for further investigation. The new BRAKER2 pipeline generates and integrates external protein support into the iterative process of training and gene prediction by GeneMark-EP+ and AUGUSTUS. BRAKER2 continues the line started by BRAKER1 where self-training GeneMark-ET and AUGUSTUS made gene predictions supported by transcriptomic data. Among the challenges addressed by the new pipeline was a generation of reliable hints to proteincoding exon boundaries from likely homologous but evolutionarily distant proteins. In comparison with other pipelines for eukaryotic genome annotation, BRAKER2 is fully automatic. It is favorably compared under equal conditions with other pipelines, e.g. MAKER2, in terms of accuracy and performance. Development of BRAKER2 should facilitate solving the task of harmonization of annotation of proteincoding genes in genomes of different eukaryotic species. However, we fully understand that several more innovations are needed in transcriptomic and proteomic technologies as well as in algorithmic development to reach the goal of highly accurate annotation of eukaryotic genomes.
引用
收藏
页码:1 / 11
页数:11
相关论文
共 50 条
  • [11] A footprint of desiccation tolerance in the genome of Xerophyiita viscosa
    Costa, Maria-Ceclia D.
    Artur, Mariana A. S.
    Maia, Julio
    Jonkheer, Eef
    Derks, Martijn F. L.
    Nijveen, Harm
    Williams, Brett
    Mundree, Sagadevan G.
    Jimenez-Gomez, Jose M.
    Hesselink, Thamara
    Schijlen, Elio G. W. M.
    Ligterink, Wilco
    Oliver, Melvin J.
    Farrant, Jill M.
    Hilhorst, Henk W. M.
    [J]. NATURE PLANTS, 2017, 3 (04)
  • [12] Ant-infecting Ophiocordyceps genomes reveal a high diversity of potential behavioral manipulation genes and a possible major role for enterotoxins
    de Bekker, Charissa
    Ohm, Robin A.
    Evans, Harry C.
    Brachmann, Andreas
    Hughes, David P.
    [J]. SCIENTIFIC REPORTS, 2017, 7
  • [13] Accelerated Profile HMM Searches
    Eddy, Sean R.
    [J]. PLOS COMPUTATIONAL BIOLOGY, 2011, 7 (10)
  • [14] Gene recognition via spliced sequence alignment
    Gelfand, MS
    Mironov, AA
    Pevzner, PA
    [J]. PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 1996, 93 (17) : 9061 - 9066
  • [15] Direct mapping and alignment of protein sequences onto genomic sequence
    Gotoh, Osamu
    [J]. BIOINFORMATICS, 2008, 24 (21) : 2438 - 2444
  • [16] Engineering a software tool for gene structure prediction in higher organisms
    Gremme, G
    Brendel, V
    Sparks, ME
    Kurtz, S
    [J]. INFORMATION AND SOFTWARE TECHNOLOGY, 2005, 47 (15) : 965 - 978
  • [17] BRAKER1: Unsupervised RNA-Seq-Based Genome Annotation with GeneMark-ET and AUGUSTUS
    Hoff, Katharina J.
    Lange, Simone
    Lomsadze, Alexandre
    Borodovsky, Mark
    Stanke, Mario
    [J]. BIOINFORMATICS, 2016, 32 (05) : 767 - 769
  • [18] WebAUGUSTUS-a web service for training AUGUSTUS and predicting genes in eukaryotes
    Hoff, Katharina J.
    Stanke, Mario
    [J]. NUCLEIC ACIDS RESEARCH, 2013, 41 (W1) : W123 - W128
  • [19] MAKER2: an annotation pipeline and genome-database management tool for second-generation genome projects
    Holt, Carson
    Yandell, Mark
    [J]. BMC BIOINFORMATICS, 2011, 12
  • [20] Combining RNA-seq data and homology-based gene prediction for plants, animals and fungi
    Keilwagen, Jens
    Hartung, Frank
    Paulini, Michael
    Twardziok, Sven O.
    Grau, Jan
    [J]. BMC BIOINFORMATICS, 2018, 19