A modified GC-specific MAKER gene annotation method reveals improved and novel gene predictions of high and low GC content in Oryza sativa

被引:19
作者
Bowman, Megan J. [1 ,2 ]
Pulman, Jane A. [1 ,3 ,4 ]
Liu, Tiffany L. [1 ]
Childs, Kevin L. [1 ,3 ]
机构
[1] Michigan State Univ, Dept Plant Biol, 612 Wilson Rd,Room 166, E Lansing, MI 48824 USA
[2] Van Andel Res Inst, Grand Rapids, MI 49506 USA
[3] Michigan State Univ, Ctr Genom Enabled Plant Sci, E Lansing, MI 48824 USA
[4] Univ Liverpool, Ctr Genom Res, Liverpool L69 7ZB, Merseyside, England
基金
美国国家科学基金会;
关键词
QUALITY-CONTROL; GENOME; MODEL; ARABIDOPSIS; EVOLUTION;
D O I
10.1186/s12859-017-1942-z
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Background: Accurate structural annotation depends on well-trained gene prediction programs. Training data for gene prediction programs are often chosen randomly from a subset of high-quality genes that ideally represent the variation found within a genome. One aspect of gene variation is GC content, which differs across species and is bimodal in grass genomes. When gene prediction programs are trained on a subset of grass genes with random GC content, they are effectively being trained on two classes of genes at once, and this can be expected to result in poor results when genes are predicted in new genome sequences. Results: We find that gene prediction programs trained on grass genes with random GC content do not completely predict all grass genes with extreme GC content. We show that gene prediction programs that are trained with grass genes with high or low GC content can make both better and unique gene predictions compared to gene prediction programs that are trained on genes with random GC content. By separately training gene prediction programs with genes from multiple GC ranges and using the programs within the MAKER genome annotation pipeline, we were able to improve the annotation of the Oryza sativa genome compared to using the standard MAKER annotation protocol. Gene structure was improved in over 13% of genes, and 651 novel genes were predicted by the GC-specific MAKER protocol. Conclusions: We present a new GC-specific MAKER annotation protocol to predict new and improved gene models and assess the biological significance of this method in Oryza sativa. We expect that this protocol will also be beneficial for gene prediction in any organism with bimodal or other unusual gene GC content.
引用
收藏
页数:15
相关论文
共 52 条
[1]   Gapped BLAST and PSI-BLAST: a new generation of protein database search programs [J].
Altschul, SF ;
Madden, TL ;
Schaffer, AA ;
Zhang, JH ;
Zhang, Z ;
Miller, W ;
Lipman, DJ .
NUCLEIC ACIDS RESEARCH, 1997, 25 (17) :3389-3402
[2]   Near-optimal probabilistic RNA-seq quantification (vol 34, pg 525, 2016) [J].
Bray, Nicolas L. ;
Pimentel, Harold ;
Melsted, Pall ;
Pachter, Lior .
NATURE BIOTECHNOLOGY, 2016, 34 (08) :888-888
[3]   BLAST plus : architecture and applications [J].
Camacho, Christiam ;
Coulouris, George ;
Avagyan, Vahram ;
Ma, Ning ;
Papadopoulos, Jason ;
Bealer, Kevin ;
Madden, Thomas L. .
BMC BIOINFORMATICS, 2009, 10
[4]  
Campbell Michael S, 2014, Curr Protoc Bioinformatics, V48, DOI 10.1002/0471250953.bi0411s48
[5]   MAKER-P: A Tool Kit for the Rapid Creation, Management, and Quality Control of Plant Genome Annotations [J].
Campbell, Michael S. ;
Law, MeiYee ;
Holt, Carson ;
Stein, Joshua C. ;
Moghe, Gaurav D. ;
Hufnagel, David E. ;
Lei, Jikai ;
Achawanantakun, Rujira ;
Jiao, Dian ;
Lawrence, Carolyn J. ;
Ware, Doreen ;
Shiu, Shin-Han ;
Childs, Kevin L. ;
Sun, Yanni ;
Jiang, Ning ;
Yandell, Mark .
PLANT PHYSIOLOGY, 2014, 164 (02) :513-524
[6]   MAKER: An easy-to-use annotation pipeline designed for emerging model organism genomes [J].
Cantarel, Brandi L. ;
Korf, Ian ;
Robb, Sofia M. C. ;
Parra, Genis ;
Ross, Eric ;
Moore, Barry ;
Holt, Carson ;
Alvarado, Alejandro Sanchez ;
Yandell, Mark .
GENOME RESEARCH, 2008, 18 (01) :188-196
[7]  
Carels N, 2000, GENETICS, V154, P1819
[8]   The Bimodal Distribution of Genic GC Content Is Ancestral to Monocot Species [J].
Clement, Yves ;
Fustier, Margaux-Alison ;
Nabholz, Benoit ;
Glemin, Sylvain .
GENOME BIOLOGY AND EVOLUTION, 2015, 7 (01) :336-348
[9]   An isochore map of human chromosomes [J].
Costantini, M ;
Clay, O ;
Auletta, F ;
Bernardi, G .
GENOME RESEARCH, 2006, 16 (04) :536-541
[10]   Quantitative measures for the management and comparison of annotated genomes [J].
Eilbeck, Karen ;
Moore, Barry ;
Holt, Carson ;
Yandell, Mark .
BMC BIOINFORMATICS, 2009, 10