Current methods for automated annotation of protein-coding genes

被引:16
作者
Hoff, K. J. [1 ]
Stanke, M. [1 ]
机构
[1] Ernst Moritz Arndt Univ Greifswald, Inst Mathemat & Informat, Walther Rathenau Str 47, D-17487 Greifswald, Germany
关键词
RNA-SEQ; GENOME ANNOTATION; PREDICTION; ALIGNMENTS; GENERATION; QUANTIFICATION; IDENTIFICATION; INTEGRATION; DROSOPHILA; ALGORITHM;
D O I
10.1016/j.cois.2015.02.008
中图分类号
Q [生物科学];
学科分类号
07 ; 0710 ; 09 ;
摘要
We review software tools for gene prediction - the identification of protein-coding genes and their structure in genome sequences. The discussed approaches include methods based on RNA-Seq and current methods based on homology - comparative gene prediction and protein spliced alignments. Many methods require that their parameters are adjusted to the target species or its broader clade. These include ab initio gene finders, integrated approaches with ab initio components and some aligners. We also review current automatic methods for training for the common case that a bona fide training set of gene structures is not available before annotation.
引用
收藏
页码:8 / 14
页数:7
相关论文
共 52 条
  • [1] JIGSAW: integration of multiple sources of evidence for gene prediction
    Allen, JE
    Salzberg, SL
    [J]. BIOINFORMATICS, 2005, 21 (18) : 3596 - 3603
  • [2] Behr J., 2010, BMC Bioinformatics, V11, pO8
  • [3] MITIE: Simultaneous RNA-Seq-based transcript identification and quantification in multiple samples
    Behr, Jonas
    Kahles, Andre
    Zhong, Yi
    Sreedharan, Vipin T.
    Drewe, Philipp
    Raetsch, Gunnar
    [J]. BIOINFORMATICS, 2013, 29 (20) : 2529 - 2538
  • [4] Automated gene-model curation using global discriminative learning
    Bernal, Axel
    Crammer, Koby
    Pereira, Fernando
    [J]. BIOINFORMATICS, 2012, 28 (12) : 1571 - 1578
  • [5] GeneWise and genomewise
    Birney, E
    Clamp, M
    Durbin, R
    [J]. GENOME RESEARCH, 2004, 14 (05) : 988 - 995
  • [6] Steady progress and recent breakthroughs in the accuracy of automated genome annotation
    Brent, Michael R.
    [J]. NATURE REVIEWS GENETICS, 2008, 9 (01) : 62 - 73
  • [7] Discovery and revision of Arabidopsis genes by proteogenomics
    Castellana, Natalie E.
    Payne, Samuel H.
    Shen, Zhouxin
    Stanke, Mario
    Bafna, Vineet
    Briggs, Steven P.
    [J]. PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 2008, 105 (52) : 21034 - 21038
  • [8] nGASP - the nematode genome annotation assessment project
    Coghlan, Avril
    Fiedler, Tristan J.
    Mckay, Sheldon J.
    Flicek, Paul
    Harris, Todd W.
    Blasiar, Darin
    Stein, Lincoln D.
    [J]. BMC BIOINFORMATICS, 2008, 9 (1)
  • [9] A Detailed History of Intron-rich Eukaryotic Ancestors Inferred from a Global Survey of 100 Complete Genomes
    Csuros, Miklos
    Rogozin, Igor B.
    Koonin, Eugene V.
    [J]. PLOS COMPUTATIONAL BIOLOGY, 2011, 7 (09)
  • [10] STAR: ultrafast universal RNA-seq aligner
    Dobin, Alexander
    Davis, Carrie A.
    Schlesinger, Felix
    Drenkow, Jorg
    Zaleski, Chris
    Jha, Sonali
    Batut, Philippe
    Chaisson, Mark
    Gingeras, Thomas R.
    [J]. BIOINFORMATICS, 2013, 29 (01) : 15 - 21