Quantifying Protein Function Specificity in the Gene Ontology

被引:4
作者
Louie, Brenton [1 ,2 ]
Bergen, Silas [1 ,3 ]
Higdon, Roger [1 ,2 ]
Kolker, Eugene [1 ,2 ,4 ]
机构
[1] Seattle Childrens Res Inst, Bioinformat & High Throughput Anal Lab, Seattle, WA USA
[2] Seattle Childrens Hosp, Seattle, WA USA
[3] Univ Washington, Sch Publ Hlth, Dept Biostat, Seattle, WA 98195 USA
[4] Univ Washington, Sch Med, Dept Med Educ & Biomed Informat, Biomed & Hlth Informat Div, Seattle, WA 98195 USA
关键词
protein annotation; protein function; function specificity; STRAIN RD KW20; HAEMOPHILUS-INFLUENZAE; SEMANTIC SIMILARITY; HYPOTHETICAL GENES; MODEL; EXPRESSION; SEQUENCE; BIOLOGY;
D O I
10.4056/sigs.561626
中图分类号
Q3 [遗传学];
学科分类号
071007 ; 090102 ;
摘要
Quantitative or numerical metrics of protein function specificity made possible by the Gene Ontology are useful in that they enable development of distance or similarity measures between protein functions. Here we describe how to calculate four measures of function specificity for GO terms: 1) number of ancestor terms; 2) number of offspring terms; 3) proportion of terms; and 4) Information Content (IC). We discuss the relationship between the metrics and the strengths and weaknesses of each.
引用
收藏
页码:238 / 244
页数:7
相关论文
共 11 条
[1]   Gene Ontology: tool for the unification of biology [J].
Ashburner, M ;
Ball, CA ;
Blake, JA ;
Botstein, D ;
Butler, H ;
Cherry, JM ;
Davis, AP ;
Dolinski, K ;
Dwight, SS ;
Eppig, JT ;
Harris, MA ;
Hill, DP ;
Issel-Tarver, L ;
Kasarskis, A ;
Lewis, S ;
Matese, JC ;
Richardson, JE ;
Ringwald, M ;
Rubin, GM ;
Sherlock, G .
NATURE GENETICS, 2000, 25 (01) :25-29
[2]  
Chang S.-K., 2003, Data Structures and Algorithms
[3]   Bioconductor: open software development for computational biology and bioinformatics [J].
Gentleman, RC ;
Carey, VJ ;
Bates, DM ;
Bolstad, B ;
Dettling, M ;
Dudoit, S ;
Ellis, B ;
Gautier, L ;
Ge, YC ;
Gentry, J ;
Hornik, K ;
Hothorn, T ;
Huber, W ;
Iacus, S ;
Irizarry, R ;
Leisch, F ;
Li, C ;
Maechler, M ;
Rossini, AJ ;
Sawitzki, G ;
Smith, C ;
Smyth, G ;
Tierney, L ;
Yang, JYH ;
Zhang, JH .
GENOME BIOLOGY, 2004, 5 (10)
[4]   Global profiling of Shewanella oneidensis MR-1:: Expression of hypothetical genes and improved functional annotations [J].
Kolker, E ;
Picone, AF ;
Galperin, MY ;
Romine, MF ;
Higdon, R ;
Makarova, KS ;
Kolker, N ;
Anderson, GA ;
Qiu, XY ;
Auberry, KJ ;
Babnigg, G ;
Beliaev, AS ;
Edlefsen, P ;
Elias, DA ;
Gorby, YA ;
Holzman, T ;
Klappenbach, JA ;
Konstantinidis, KT ;
Land, ML ;
Lipton, MS ;
McCue, LA ;
Monroe, M ;
Pasa-Tolic, L ;
Pinchuk, G ;
Purvine, S ;
Serres, MH ;
Tsapin, S ;
Zakrajsek, BA ;
Zhou, JH ;
Larimer, FW ;
Lawrence, CE ;
Riley, M ;
Collart, FR ;
Yates, JR ;
Smith, RD ;
Giometti, CS ;
Nealson, KH ;
Fredrickson, JK ;
Tiedje, JM .
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 2005, 102 (06) :2099-2104
[5]   Identification and functional analysis of 'hypothetical' genes expressed in Haemophilus influenzae [J].
Kolker, E ;
Makarova, KS ;
Shabalina, S ;
Picone, AF ;
Purvine, S ;
Holzman, T ;
Cherny, T ;
Armbruster, D ;
Munson, RS ;
Kolesov, G ;
Frishman, D ;
Galperin, MY .
NUCLEIC ACIDS RESEARCH, 2004, 32 (08) :2353-2361
[6]   Initial proteome analysis of model microorganism Haemophilus influenzae strain Rd KW20 [J].
Kolker, E ;
Purvine, S ;
Galperin, MY ;
Stolyar, S ;
Goodlett, DR ;
Nesvizhskii, AI ;
Keller, A ;
Xie, T ;
Eng, JK ;
Yi, E ;
Hood, L ;
Picone, AF ;
Cherny, T ;
Tjaden, BC ;
Siegel, AF ;
Reilly, TJ ;
Makarova, KS ;
Palsson, BO ;
Smith, AL .
JOURNAL OF BACTERIOLOGY, 2003, 185 (15) :4593-4602
[7]   Investigating semantic similarity measures across the Gene Ontology: the relationship between sequence and annotation [J].
Lord, PW ;
Stevens, RD ;
Brass, A ;
Goble, CA .
BIOINFORMATICS, 2003, 19 (10) :1275-1283
[8]   A Statistical Model of Protein Sequence Similarity and Function Similarity Reveals Overly-Specific Function Predictions [J].
Louie, Brenton ;
Higdon, Roger ;
Kolker, Eugene .
PLOS ONE, 2009, 4 (10)
[9]   Metrics for GO based protein semantic similarity: a systematic evaluation [J].
Pesquita, Catia ;
Faria, Daniel ;
Bastos, Hugo ;
Ferreira, Antonio En ;
Falcao, Andre O. ;
Couto, Francisco M. .
BMC BIOINFORMATICS, 2008, 9 (Suppl 5)
[10]   In silico metabolic model and protein expression of Haemophilus influenzae strain Rd KW20 in rich medium [J].
Raghunathan, A ;
Price, ND ;
Galperin, MY ;
Makarova, KS ;
Purvine, S ;
Picone, AF ;
Cherny, T ;
Xie, T ;
Reilly, TJ ;
Munson, R ;
Tyler, RE ;
Akerley, BJ ;
Smith, AL ;
Palsson, BO ;
Kolker, E .
OMICS-A JOURNAL OF INTEGRATIVE BIOLOGY, 2004, 8 (01) :25-41