Assessment and refinement of eukaryotic gene structure prediction with gene-structure-aware multiple protein sequence alignment

被引:29
作者
Gotoh, Osamu [1 ,2 ]
Morita, Mariko [1 ]
Nelson, David R. [3 ]
机构
[1] Natl Inst Adv Ind Sci & Technol, CBRC, Koto Ku, Tokyo 1350064, Japan
[2] Univ Tokyo, Grad Sch Frontier Sci, Dept Computat Biol, Kashiwa, Chiba 2778561, Japan
[3] Univ Tennessee, Ctr Hlth Sci, Dept Microbiol Immunol & Biochem, Memphis, TN 38163 USA
来源
BMC BIOINFORMATICS | 2014年 / 15卷
基金
美国国家科学基金会;
关键词
Genome annotation; Gene prediction; Gene structure; Multiple sequence alignment; Spliced alignment; Cytochrome P450; Ribosomal proteins; SECONDARY STRUCTURE PREDICTION; GENOME DATABASE; ANNOTATION; IMPROVEMENT; ACCURACY; RECOGNITION; GENERATION; SIMILARITY; DIVERSITY; ALGORITHM;
D O I
10.1186/1471-2105-15-189
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Background: Accurate computational identification of eukaryotic gene organization is a long-standing problem. Despite the fundamental importance of precise annotation of genes encoded in newly sequenced genomes, the accuracy of predicted gene structures has not been critically evaluated, mostly due to the scarcity of proper assessment methods. Results: We present a gene-structure-aware multiple sequence alignment method for gene prediction using amino acid sequences translated from homologous genes from many genomes. The approach provides rich information concerning the reliability of each predicted gene structure. We have also devised an iterative method that attempts to improve the structures of suspiciously predicted genes based on a spliced alignment algorithm using consensus sequences or reliable homologs as templates. Application of our methods to cytochrome P450 and ribosomal proteins from 47 plant genomes indicated that 50 similar to 60 % of the annotated gene structures are likely to contain some defects. Whereas more than half of the defect-containing genes may be intrinsically broken, i. e. they are pseudogenes or gene fragments, located in unfinished sequencing areas, or corresponding to non-productive isoforms, the defects found in a majority of the remaining gene candidates can be remedied by our iterative refinement method. Conclusions: Refinement of eukaryotic gene structures mediated by gene-structure-aware multiple protein sequence alignment is a useful strategy to dramatically improve the overall prediction quality of a set of homologous genes. Our method will be applicable to various families of protein-coding genes if their domain structures are evolutionarily stable. It is also feasible to apply our method to gene families from all kingdoms of life, not just plants.
引用
收藏
页数:13
相关论文
共 50 条
  • [1] Assessment and refinement of eukaryotic gene structure prediction with gene-structure-aware multiple protein sequence alignment
    Osamu Gotoh
    Mariko Morita
    David R Nelson
    BMC Bioinformatics, 15
  • [2] Integrating protein secondary structure prediction and multiple sequence alignment
    Simossis, VA
    Heringa, J
    CURRENT PROTEIN & PEPTIDE SCIENCE, 2004, 5 (04) : 249 - 266
  • [3] Application of multiple sequence alignment profiles to improve protein secondary structure prediction
    Cuff, JA
    Barton, GJ
    PROTEINS-STRUCTURE FUNCTION AND BIOINFORMATICS, 2000, 40 (03) : 502 - 511
  • [4] Protein Multiple Sequence Alignment Based on Secondary Structure Similarity
    Hamidi, Sarvenaz
    Naghibzadeh, Mahmoud
    Sadri, Javad
    2013 INTERNATIONAL CONFERENCE ON ADVANCES IN COMPUTING, COMMUNICATIONS AND INFORMATICS (ICACCI), 2013, : 1224 - 1229
  • [5] A comparative assessment and analysis of 20 representative sequence alignment methods for protein structure prediction
    Yan, Renxiang
    Xu, Dong
    Yang, Jianyi
    Walker, Sara
    Zhang, Yang
    SCIENTIFIC REPORTS, 2013, 3
  • [6] Alignment of multiple protein structures based on sequence and structure features
    Madhusudhan, M. S.
    Webb, Benjamin M.
    Marti-Renom, Marc A.
    Eswar, Narayanan
    Sali, Andrej
    PROTEIN ENGINEERING DESIGN & SELECTION, 2009, 22 (09) : 569 - 574
  • [7] The Historical Evolution and Significance of Multiple Sequence Alignment in Molecular Structure and Function Prediction
    Zhang, Chenyue
    Wang, Qinxin
    Li, Yiyang
    Teng, Anqi
    Hu, Gang
    Wuyun, Qiqige
    Zheng, Wei
    BIOMOLECULES, 2024, 14 (12)
  • [8] The origins of eukaryotic gene structure
    Lynch, M
    MOLECULAR BIOLOGY AND EVOLUTION, 2006, 23 (02) : 450 - 468
  • [9] Seq-SetNet: directly exploiting multiple sequence alignment for protein secondary structure prediction
    Ju, Fusong
    Zhu, Jianwei
    Zhang, Qi
    Wei, Guozheng
    Sun, Shiwei
    Zheng, Wei-Mou
    Bu, Dongbo
    BIOINFORMATICS, 2022, 38 (04) : 990 - 996
  • [10] Gene Prediction by Multiple Spliced Alignment
    Kishi, Rodrigo Mitsuo
    dos Santos, Ronaldo Fiorilo
    Adi, Said Sadique
    ADVANCES IN BIOINFORMATICS AND COMPUTATIONAL BIOLOGY, 2011, 6832 : 26 - 33