BRAKER2: automatic eukaryotic genome annotation with GeneMark-EP plus and AUGUSTUS supported by a protein database

被引:707
作者
Bruna, Tomas [1 ]
Hoff, Katharina J. [2 ,3 ]
Lomsadze, Alexandre [4 ]
Stanke, Mario [2 ,3 ]
Borodovsky, Mark [4 ,5 ]
机构
[1] Georgia Inst Technol, Sch Biol Sci, Atlanta, GA 30332 USA
[2] Univ Greifswald, Inst Math & Comp Sci, D-17489 Greifswald, Germany
[3] Univ Greifswald, Ctr Funct Genom Microbes, D-17489 Greifswald, Germany
[4] Georgia Inst Technol, Wallace H Coulter Dept Biomed Engn, Atlanta, GA 30332 USA
[5] Georgia Inst Technol, Sch Computat Sci & Engn, Atlanta, GA 30332 USA
基金
美国国家卫生研究院;
关键词
STRUCTURE PREDICTION; ALIGNMENT; PIPELINE; INSIGHTS; FINDER; GENES;
D O I
10.1093/nargab/lqaa108
中图分类号
Q3 [遗传学];
学科分类号
071007 ; 090102 ;
摘要
The task of eukaryotic genome annotation remains challenging. Only a few genomes could serve as standards of annotation achieved through a tremendous investment of human curation efforts. Still, the correctness of all alternative isoforms, even in the best-annotated genomes, could be a good subject for further investigation. The new BRAKER2 pipeline generates and integrates external protein support into the iterative process of training and gene prediction by GeneMark-EP+ and AUGUSTUS. BRAKER2 continues the line started by BRAKER1 where self-training GeneMark-ET and AUGUSTUS made gene predictions supported by transcriptomic data. Among the challenges addressed by the new pipeline was a generation of reliable hints to proteincoding exon boundaries from likely homologous but evolutionarily distant proteins. In comparison with other pipelines for eukaryotic genome annotation, BRAKER2 is fully automatic. It is favorably compared under equal conditions with other pipelines, e.g. MAKER2, in terms of accuracy and performance. Development of BRAKER2 should facilitate solving the task of harmonization of annotation of proteincoding genes in genomes of different eukaryotic species. However, we fully understand that several more innovations are needed in transcriptomic and proteomic technologies as well as in algorithmic development to reach the goal of highly accurate annotation of eukaryotic genomes.
引用
收藏
页码:1 / 11
页数:11
相关论文
共 50 条
  • [1] Tandem repeats finder: a program to analyze DNA sequences
    Benson, G
    [J]. NUCLEIC ACIDS RESEARCH, 1999, 27 (02) : 573 - 580
  • [2] Single-molecule sequencing and chromatin conformation capture enable de novo reference assembly of the domestic goat genome
    Bickhart, Derek M.
    Rosen, Benjamin D.
    Koren, Sergey
    Sayre, Brian L.
    Hastie, Alex R.
    Chan, Saki
    Lee, Joyce
    Lam, Ernest T.
    Liachko, Ivan
    Sullivan, Shawn T.
    Burton, Joshua N.
    Huson, Heather J.
    Nystrom, John C.
    Kelley, Christy M.
    Hutchison, Jana L.
    Zhou, Yang
    Sun, Jiajie
    Crisa, Alessandra
    de Leon, F. Abel Ponce
    Schwartz, John C.
    Hammond, John A.
    Waldbieser, Geoffrey C.
    Schroeder, Steven G.
    Liu, George E.
    Dunham, Maitreya J.
    Shendure, Jay
    Sonstegard, Tad S.
    Phillippy, Adam M.
    Van Tassell, Curtis P.
    Smith, Timothy P. L.
    [J]. NATURE GENETICS, 2017, 49 (04) : 643 - +
  • [3] GeneWise and genomewise
    Birney, E
    Clamp, M
    Durbin, R
    [J]. GENOME RESEARCH, 2004, 14 (05) : 988 - 995
  • [4] Insights into Land Plant Evolution Garnered from the Marchantia polymorpha Genome
    Bowman, John L.
    Kohchi, Takayuki
    Yamato, Katsuyuki T.
    Jenkins, Jerry
    Shu, Shengqiang
    Ishizaki, Kimitsune
    Yamaoka, Shohei
    Nishihama, Ryuichi
    Nakamura, Yasukazu
    Berger, Frederic
    Adam, Catherine
    Aki, Shiori Sugamata
    Althoff, Felix
    Araki, Takashi
    Arteaga-Vazquez, Mario A.
    Balasubrmanian, Sureshkumar
    Barry, Kerrie
    Bauer, Diane
    Boehm, Christian R.
    Briginshaw, Liam
    Caballero-Perez, Juan
    Catarino, Bruno
    Chen, Feng
    Chiyoda, Shota
    Chovatia, Mansi
    Davies, Kevin M.
    Delmans, Mihails
    Demura, Taku
    Dierschke, Tom
    Dolan, Liam
    Dorantes-Acosta, Ana E.
    Eklund, D. Magnus
    Florent, Stevie N.
    Flores-Sandoval, Eduardo
    Fujiyama, Asao
    Fukuzawa, Hideya
    Galik, Bence
    Grimanelli, Daniel
    Grimwood, Jane
    Grossniklaus, Ueli
    Hamada, Takahiro
    Haseloff, Jim
    Hetherington, Alexander J.
    Higo, Asuka
    Hirakawa, Yuki
    Hundley, Hope N.
    Ikeda, Yoko
    Inoue, Keisuke
    Inoue, Shin-Ichiro
    Ishida, Sakiko
    [J]. CELL, 2017, 171 (02) : 287 - +
  • [5] GeneMark-EP plus : eukaryotic gene prediction with self-training in the space of genes and proteins
    Bruna, Tomas
    Lomsadze, Alexandre
    Borodovsky, Mark
    [J]. NAR GENOMICS AND BIOINFORMATICS, 2020, 2 (02)
  • [6] Fast and sensitive protein alignment using DIAMOND
    Buchfink, Benjamin
    Xie, Chao
    Huson, Daniel H.
    [J]. NATURE METHODS, 2015, 12 (01) : 59 - 60
  • [7] Campbell Michael S, 2014, Curr Protoc Bioinformatics, V48, DOI 10.1002/0471250953.bi0411s48
  • [8] MAKER: An easy-to-use annotation pipeline designed for emerging model organism genomes
    Cantarel, Brandi L.
    Korf, Ian
    Robb, Sofia M. C.
    Parra, Genis
    Ross, Eric
    Moore, Barry
    Holt, Carson
    Alvarado, Alejandro Sanchez
    Yandell, Mark
    [J]. GENOME RESEARCH, 2008, 18 (01) : 188 - 196
  • [9] The floral organ number4 gene encoding a putative ortholog of Arabidopsis CLAVATA3 regulates apical meristem size in rice
    Chu, Huangwei
    Qian, Qian
    Liang, Wanqi
    Yin, Changsong
    Tan, Hexin
    Yao, Xuan
    Yuan, Zheng
    Yang, Jun
    Huang, Hai
    Luo, Da
    Ma, Hong
    Zhang, Dabing
    [J]. PLANT PHYSIOLOGY, 2006, 142 (03) : 1039 - 1052
  • [10] nGASP - the nematode genome annotation assessment project
    Coghlan, Avril
    Fiedler, Tristan J.
    Mckay, Sheldon J.
    Flicek, Paul
    Harris, Todd W.
    Blasiar, Darin
    Stein, Lincoln D.
    [J]. BMC BIOINFORMATICS, 2008, 9 (1)