Analysis of protein function and its prediction from amino acid sequence

被引:100
作者
Clark, Wyatt T. [1 ]
Radivojac, Predrag [1 ]
机构
[1] Indiana Univ, Sch Informat & Comp, Bloomington, IN 47405 USA
基金
美国国家科学基金会;
关键词
protein function; prediction; protein function transfer; gene ontology; neural network; GENE ONTOLOGY; MOONLIGHTING PROTEINS; INTRINSIC DISORDER; TWILIGHT ZONE; ANNOTATION; GENOMES; CLASSIFICATION; INFERENCE; IDENTITY; DATABASE;
D O I
10.1002/prot.23029
中图分类号
Q5 [生物化学]; Q7 [分子生物学];
学科分类号
071010 ; 081704 ;
摘要
Understanding protein function is one of the keys to understanding life at the molecular level. It is also important in the context of human disease because many conditions arise as a consequence of alterations of protein function. The recent availability of relatively inexpensive sequencing technology has resulted in thousands of complete or partially sequenced genomes with millions of functionally uncharacterized proteins. Such a large volume of data, combined with the lack of high-throughput experimental assays to functionally annotate proteins, attributes to the growing importance of automated function prediction. Here, we study proteins annotated by Gene Ontology (GO) terms and estimate the accuracy of functional transfer from protein sequence only. We find that the transfer of GO terms by pairwise sequence alignments is only moderately accurate, showing a surprisingly small influence of sequence identity (SID) in a broad range (30-100%). We developed and evaluated a new predictor of protein function, functional annotator (FANN), from amino acid sequence. The predictor exploits a multioutput neural network framework which is well suited to simultaneously modeling dependencies between functional terms. Experiments provide evidence that FANN-GO (predictor of GO terms; available from http://www.informatics.indiana.edu/predrag) outperforms standard methods such as transfer by global or local SID as well as GOtcha, a method that incorporates the structure of GO.
引用
收藏
页码:2086 / 2096
页数:11
相关论文
共 64 条
[1]   Domain-Based and Family-Specific Sequence Identity Thresholds Increase the Levels of Reliable Protein Function Transfer [J].
Addou, Sarah ;
Rentzsch, Robert ;
Lee, David ;
Orengo, Christine A. .
JOURNAL OF MOLECULAR BIOLOGY, 2009, 387 (02) :416-430
[2]  
[Anonymous], INT C NEUR NETW SAN
[3]   Gene Ontology: tool for the unification of biology [J].
Ashburner, M ;
Ball, CA ;
Blake, JA ;
Botstein, D ;
Butler, H ;
Cherry, JM ;
Davis, AP ;
Dolinski, K ;
Dwight, SS ;
Eppig, JT ;
Harris, MA ;
Hill, DP ;
Issel-Tarver, L ;
Kasarskis, A ;
Lewis, S ;
Matese, JC ;
Richardson, JE ;
Ringwald, M ;
Rubin, GM ;
Sherlock, G .
NATURE GENETICS, 2000, 25 (01) :25-29
[4]   The universal protein resource (UniProt) [J].
Bairoch, A ;
Apweiler, R ;
Wu, CH ;
Barker, WC ;
Boeckmann, B ;
Ferro, S ;
Gasteiger, E ;
Huang, HZ ;
Lopez, R ;
Magrane, M ;
Martin, MJ ;
Natale, DA ;
O'Donovan, C ;
Redaschi, N ;
Yeh, LSL .
NUCLEIC ACIDS RESEARCH, 2005, 33 :D154-D159
[5]   Structure-based function inference using protein family-specific fingerprints [J].
Bandyopadhyay, Deepak ;
Huan, Jun ;
Liu, Jinze ;
Prins, Jan ;
Snoeyink, Jack ;
Wang, Wei ;
Tropsha, Alexander .
PROTEIN SCIENCE, 2006, 15 (06) :1537-1543
[6]  
BARTLETT G, 2003, STRUCTURAL BIOINFORM
[7]   Hierarchical multi-label prediction of gene function [J].
Barutcuoglu, Z ;
Schapire, RE ;
Troyanskaya, OG .
BIOINFORMATICS, 2006, 22 (07) :830-836
[8]   The Unified Medical Language System (UMLS): integrating biomedical terminology [J].
Bodenreider, O .
NUCLEIC ACIDS RESEARCH, 2004, 32 :D267-D270
[9]   Errors in genome annotation [J].
Brenner, SE .
TRENDS IN GENETICS, 1999, 15 (04) :132-133
[10]   Functional classification using phylogenomic inference [J].
Brown, Duncan ;
Sjolander, Kimmen .
PLOS COMPUTATIONAL BIOLOGY, 2006, 2 (06) :479-483