Assessment and refinement of eukaryotic gene structure prediction with gene-structure-aware multiple protein sequence alignment

被引：29

作者：

Gotoh, Osamu ^{[1
,2
]}

Morita, Mariko ^{[1
]}

Nelson, David R. ^{[3
]}

机构：

[1] Natl Inst Adv Ind Sci & Technol, CBRC, Koto Ku, Tokyo 1350064, Japan

[2] Univ Tokyo, Grad Sch Frontier Sci, Dept Computat Biol, Kashiwa, Chiba 2778561, Japan

[3] Univ Tennessee, Ctr Hlth Sci, Dept Microbiol Immunol & Biochem, Memphis, TN 38163 USA

来源：

BMC BIOINFORMATICS | 2014年 / 15卷

基金：

美国国家科学基金会;

关键词：

Genome annotation; Gene prediction; Gene structure; Multiple sequence alignment; Spliced alignment; Cytochrome P450; Ribosomal proteins; SECONDARY STRUCTURE PREDICTION; GENOME DATABASE; ANNOTATION; IMPROVEMENT; ACCURACY; RECOGNITION; GENERATION; SIMILARITY; DIVERSITY; ALGORITHM;

D O I：

10.1186/1471-2105-15-189

中图分类号：

Q5 [生物化学];

学科分类号：

071010 ; 081704 ;

摘要：

Background: Accurate computational identification of eukaryotic gene organization is a long-standing problem. Despite the fundamental importance of precise annotation of genes encoded in newly sequenced genomes, the accuracy of predicted gene structures has not been critically evaluated, mostly due to the scarcity of proper assessment methods. Results: We present a gene-structure-aware multiple sequence alignment method for gene prediction using amino acid sequences translated from homologous genes from many genomes. The approach provides rich information concerning the reliability of each predicted gene structure. We have also devised an iterative method that attempts to improve the structures of suspiciously predicted genes based on a spliced alignment algorithm using consensus sequences or reliable homologs as templates. Application of our methods to cytochrome P450 and ribosomal proteins from 47 plant genomes indicated that 50 similar to 60 % of the annotated gene structures are likely to contain some defects. Whereas more than half of the defect-containing genes may be intrinsically broken, i. e. they are pseudogenes or gene fragments, located in unfinished sequencing areas, or corresponding to non-productive isoforms, the defects found in a majority of the remaining gene candidates can be remedied by our iterative refinement method. Conclusions: Refinement of eukaryotic gene structures mediated by gene-structure-aware multiple protein sequence alignment is a useful strategy to dramatically improve the overall prediction quality of a set of homologous genes. Our method will be applicable to various families of protein-coding genes if their domain structures are evolutionarily stable. It is also feasible to apply our method to gene families from all kingdoms of life, not just plants.

引用

页数：13

共 50 条

[21] Prediction of protein secondary structure with a reliability score estimated by local sequence clustering
Jiang, F
PROTEIN ENGINEERING, 2003, 16 (09): : 651 - 657
[22] NSRGRN: a network structure refinement method for gene regulatory network inference
Liu, Wei
Yang, Yu
Lu, Xu
Fu, Xiangzheng
Sun, Ruiqing
Yang, Li
Peng, Li
BRIEFINGS IN BIOINFORMATICS, 2023, 24 (03)
[23] DIALIGN-TX and multiple protein alignment using secondary structure information at GOBICS
Subramanian, Amarendran R.
Hiran, Suvrat
Steinkamp, Rasmus
Meinicke, Peter
Corel, Eduardo
Morgenstern, Burkhard
NUCLEIC ACIDS RESEARCH, 2010, 38 : W19 - W22
[24] Gene structure prediction by spliced alignment of genomic DNA with protein sequences: Increased accuracy by differential splice site scoring
Usuka, J
Brendel, V
JOURNAL OF MOLECULAR BIOLOGY, 2000, 297 (05) : 1075 - 1085
[25] Critical assessment of structure-based sequence alignment methods at distant relationships
Kalaimathy, Singarevelu
Sowdhamini, Ramanathan
Kanagarajadurai, Karuppiah
BRIEFINGS IN BIOINFORMATICS, 2011, 12 (02) : 163 - 175
[26] SCGPred:A Score-based Method for Gene Structure Prediction by Combining Multiple Sources of Evidence
Xiao Li1
2 College of Mathematics
Genomics Proteomics & Bioinformatics, 2008, 6(Z1) (Z1) : 175 - 185
[27] Gene loss, protein sequence divergence, gene dispensability, expression level, and interactivity are correlated in eukaryotic evolution
Krylov, DM
Wolf, YI
Rogozin, IB
Koonin, EV
GENOME RESEARCH, 2003, 13 (10) : 2229 - 2235
[28] Improving Protein Structure Prediction Using Multiple Sequence-Based Contact Predictions
Wu, Sitao
Szilagyi, Andras
Zhang, Yang
STRUCTURE, 2011, 19 (08) : 1182 - 1191
[29] LEARNING AND ALIGNMENT METHODS APPLIED TO PROTEIN-STRUCTURE PREDICTION
GRACY, J
CHICHE, L
SALLANTIN, J
BIOCHIMIE, 1993, 75 (05) : 353 - 361
[30] CORE: Common Region Extension Based Multiple Protein Structure Alignment for Producing Multiple Solution
Kim, Woo-Cheol
Park, Sanghyun
Won, Jung-Im
JOURNAL OF COMPUTER SCIENCE AND TECHNOLOGY, 2013, 28 (04) : 647 - 656

← 1 2 3 4 5 →