Assessment and refinement of eukaryotic gene structure prediction with gene-structure-aware multiple protein sequence alignment
被引:29
作者:
Gotoh, Osamu
论文数: 0引用数: 0
h-index: 0
机构:
Natl Inst Adv Ind Sci & Technol, CBRC, Koto Ku, Tokyo 1350064, Japan
Univ Tokyo, Grad Sch Frontier Sci, Dept Computat Biol, Kashiwa, Chiba 2778561, JapanNatl Inst Adv Ind Sci & Technol, CBRC, Koto Ku, Tokyo 1350064, Japan
Gotoh, Osamu
[1
,2
]
Morita, Mariko
论文数: 0引用数: 0
h-index: 0
机构:
Natl Inst Adv Ind Sci & Technol, CBRC, Koto Ku, Tokyo 1350064, JapanNatl Inst Adv Ind Sci & Technol, CBRC, Koto Ku, Tokyo 1350064, Japan
Morita, Mariko
[1
]
Nelson, David R.
论文数: 0引用数: 0
h-index: 0
机构:
Univ Tennessee, Ctr Hlth Sci, Dept Microbiol Immunol & Biochem, Memphis, TN 38163 USANatl Inst Adv Ind Sci & Technol, CBRC, Koto Ku, Tokyo 1350064, Japan
Nelson, David R.
[3
]
机构:
[1] Natl Inst Adv Ind Sci & Technol, CBRC, Koto Ku, Tokyo 1350064, Japan
[2] Univ Tokyo, Grad Sch Frontier Sci, Dept Computat Biol, Kashiwa, Chiba 2778561, Japan
Background: Accurate computational identification of eukaryotic gene organization is a long-standing problem. Despite the fundamental importance of precise annotation of genes encoded in newly sequenced genomes, the accuracy of predicted gene structures has not been critically evaluated, mostly due to the scarcity of proper assessment methods. Results: We present a gene-structure-aware multiple sequence alignment method for gene prediction using amino acid sequences translated from homologous genes from many genomes. The approach provides rich information concerning the reliability of each predicted gene structure. We have also devised an iterative method that attempts to improve the structures of suspiciously predicted genes based on a spliced alignment algorithm using consensus sequences or reliable homologs as templates. Application of our methods to cytochrome P450 and ribosomal proteins from 47 plant genomes indicated that 50 similar to 60 % of the annotated gene structures are likely to contain some defects. Whereas more than half of the defect-containing genes may be intrinsically broken, i. e. they are pseudogenes or gene fragments, located in unfinished sequencing areas, or corresponding to non-productive isoforms, the defects found in a majority of the remaining gene candidates can be remedied by our iterative refinement method. Conclusions: Refinement of eukaryotic gene structures mediated by gene-structure-aware multiple protein sequence alignment is a useful strategy to dramatically improve the overall prediction quality of a set of homologous genes. Our method will be applicable to various families of protein-coding genes if their domain structures are evolutionarily stable. It is also feasible to apply our method to gene families from all kingdoms of life, not just plants.
机构:
Univ Kansas, Ctr Bioinformat, Lawrence, KS 66047 USA
Univ Kansas, Dept Mol Biosci, Lawrence, KS 66047 USA
Univ Calif San Diego, Ctr Res Biol Syst, La Jolla, CA 92093 USAUniv Michigan, Ctr Computat Med & Bioinformat, Ann Arbor, MI 48109 USA
Wu, Sitao
Szilagyi, Andras
论文数: 0引用数: 0
h-index: 0
机构:
Univ Kansas, Ctr Bioinformat, Lawrence, KS 66047 USA
Univ Kansas, Dept Mol Biosci, Lawrence, KS 66047 USA
Hungarian Acad Sci, Inst Enzymol, H-1113 Budapest, HungaryUniv Michigan, Ctr Computat Med & Bioinformat, Ann Arbor, MI 48109 USA
Szilagyi, Andras
Zhang, Yang
论文数: 0引用数: 0
h-index: 0
机构:
Univ Michigan, Ctr Computat Med & Bioinformat, Ann Arbor, MI 48109 USA
Univ Michigan, Dept Biol Chem, Ann Arbor, MI 48109 USA
Univ Kansas, Ctr Bioinformat, Lawrence, KS 66047 USA
Univ Kansas, Dept Mol Biosci, Lawrence, KS 66047 USAUniv Michigan, Ctr Computat Med & Bioinformat, Ann Arbor, MI 48109 USA
机构:
Penn State Univ, Coll Informat Sci & Technol, University Pk, PA 16802 USAPenn State Univ, Coll Informat Sci & Technol, University Pk, PA 16802 USA
Kim, Woo-Cheol
Park, Sanghyun
论文数: 0引用数: 0
h-index: 0
机构:
Yonsei Univ, Dept Comp Sci, Seoul 120749, South KoreaPenn State Univ, Coll Informat Sci & Technol, University Pk, PA 16802 USA
Park, Sanghyun
Won, Jung-Im
论文数: 0引用数: 0
h-index: 0
机构:
Hallym Univ, Res Ctr Informat & Elect Engn, Chunchon 200702, Gangwon, South KoreaPenn State Univ, Coll Informat Sci & Technol, University Pk, PA 16802 USA
机构:
Univ Kansas, Ctr Bioinformat, Lawrence, KS 66047 USA
Univ Kansas, Dept Mol Biosci, Lawrence, KS 66047 USA
Univ Calif San Diego, Ctr Res Biol Syst, La Jolla, CA 92093 USAUniv Michigan, Ctr Computat Med & Bioinformat, Ann Arbor, MI 48109 USA
Wu, Sitao
Szilagyi, Andras
论文数: 0引用数: 0
h-index: 0
机构:
Univ Kansas, Ctr Bioinformat, Lawrence, KS 66047 USA
Univ Kansas, Dept Mol Biosci, Lawrence, KS 66047 USA
Hungarian Acad Sci, Inst Enzymol, H-1113 Budapest, HungaryUniv Michigan, Ctr Computat Med & Bioinformat, Ann Arbor, MI 48109 USA
Szilagyi, Andras
Zhang, Yang
论文数: 0引用数: 0
h-index: 0
机构:
Univ Michigan, Ctr Computat Med & Bioinformat, Ann Arbor, MI 48109 USA
Univ Michigan, Dept Biol Chem, Ann Arbor, MI 48109 USA
Univ Kansas, Ctr Bioinformat, Lawrence, KS 66047 USA
Univ Kansas, Dept Mol Biosci, Lawrence, KS 66047 USAUniv Michigan, Ctr Computat Med & Bioinformat, Ann Arbor, MI 48109 USA
机构:
Penn State Univ, Coll Informat Sci & Technol, University Pk, PA 16802 USAPenn State Univ, Coll Informat Sci & Technol, University Pk, PA 16802 USA
Kim, Woo-Cheol
Park, Sanghyun
论文数: 0引用数: 0
h-index: 0
机构:
Yonsei Univ, Dept Comp Sci, Seoul 120749, South KoreaPenn State Univ, Coll Informat Sci & Technol, University Pk, PA 16802 USA
Park, Sanghyun
Won, Jung-Im
论文数: 0引用数: 0
h-index: 0
机构:
Hallym Univ, Res Ctr Informat & Elect Engn, Chunchon 200702, Gangwon, South KoreaPenn State Univ, Coll Informat Sci & Technol, University Pk, PA 16802 USA