The effects of shared information on semantic calculations in the gene ontology

被引:3
作者
Bible, Paul W. [1 ]
Sun, Hong-Wei [2 ]
Morasso, Maria I. [3 ]
Loganantharaj, Rasiah [4 ]
Wei, Lai [1 ]
机构
[1] Sun Yat Sen Univ, Zhongshan Ophthalm Ctr, State Key Lab Ophthalmol, Guangzhou 510060, Guangdong, Peoples R China
[2] NIAMSD, Biodata Min & Discovery Sect, Off Sci & Technol, Intramural Res Program, Bethesda, MD 20892 USA
[3] NIAMSD, Skin Biol Lab, Intramural Res Program, Bethesda, MD USA
[4] Univ Louisiana Lafayette, Ctr Adv Comp Studies, Lab Bioinformat, Lafayette, LA 70504 USA
来源
COMPUTATIONAL AND STRUCTURAL BIOTECHNOLOGY JOURNAL | 2017年 / 15卷
基金
美国国家卫生研究院; 中国国家自然科学基金;
关键词
Semantic similarity; Gene ontology; Function prediction; Machine learning; Protein-protein interaction; Gene expression; SIMILARITY MEASURES; TERMS;
D O I
10.1016/j.csbj.2017.01.009
中图分类号
Q5 [生物化学]; Q7 [分子生物学];
学科分类号
071010 ; 081704 ;
摘要
The structured vocabulary that describes gene function, the gene ontology (GO), serves as a powerful tool in biological research. One application of GO in computational biology calculates semantic similarity between two concepts to make inferences about the functional similarity of genes. A class of term similarity algorithms explicitly calculates the shared information (SI) between concepts then substitutes this calculation into traditional term similarity measures such as Resnik, Lin, and Jiang-Conrath. Alternative SI approaches, when combined with ontology choice and term similarity type, lead to many gene-to-gene similarity measures. No thorough investigation has been made into the behavior, complexity, and performance of semantic methods derived from distinct SI approaches. We apply bootstrapping to compare the generalized performance of 57 gene-to-gene semantic measures across six benchmarks. Considering the number of measures, we additionally evaluate whether these methods can be leveraged through ensemble machine learning to improve prediction performance. Results showed that the choice of ontology type most strongly influenced performance across all evaluations. Combining measures into an ensemble classifier reduces cross-validation error beyond any individual measure for protein interaction prediction. This improvement resulted from information gained through the combination of ontology types as ensemble methods within each GO type offered no improvement. These results demonstrate that multiple SI measures can be leveraged for machine learning tasks such as automated gene function prediction by incorporating methods from across the ontologies. To facilitate future research in this area, we developed the GO Graph Tool Kit (GGTK), an open source C++ library with Python interface (github.comipaulbibleiggtk). (C) 2017 The Authors. Published by Elsevier B.V.
引用
收藏
页码:195 / 211
页数:17
相关论文
共 49 条
  • [1] BASIC LOCAL ALIGNMENT SEARCH TOOL
    ALTSCHUL, SF
    GISH, W
    MILLER, W
    MYERS, EW
    LIPMAN, DJ
    [J]. JOURNAL OF MOLECULAR BIOLOGY, 1990, 215 (03) : 403 - 410
  • [2] [Anonymous], 1998, An information-theoretic definition of similarity
  • [3] Ashburner M, 2001, GENOME RES, V11, P1425, DOI 10.1101/gr.180801
  • [4] Gene Ontology: tool for the unification of biology
    Ashburner, M
    Ball, CA
    Blake, JA
    Botstein, D
    Butler, H
    Cherry, JM
    Davis, AP
    Dolinski, K
    Dwight, SS
    Eppig, JT
    Harris, MA
    Hill, DP
    Issel-Tarver, L
    Kasarskis, A
    Lewis, S
    Matese, JC
    Richardson, JE
    Ringwald, M
    Rubin, GM
    Sherlock, G
    [J]. NATURE GENETICS, 2000, 25 (01) : 25 - 29
  • [5] Azuaje F, 2006, ICDM 2006: Sixth IEEE International Conference on Data Mining, Workshops, P114
  • [6] The GOA database in 2009-an integrated Gene Ontology Annotation resource
    Barrell, Daniel
    Dimmer, Emily
    Huntley, Rachael P.
    Binns, David
    O'Donovan, Claire
    Apweiler, Rolf
    [J]. NUCLEIC ACIDS RESEARCH, 2009, 37 : D396 - D403
  • [7] Random forests
    Breiman, L
    [J]. MACHINE LEARNING, 2001, 45 (01) : 5 - 32
  • [8] Unequal evolutionary conservation of human protein interactions in interologous networks.
    Brown, Kevin V.
    Jurisica, Igor
    [J]. GENOME BIOLOGY, 2007, 8 (05)
  • [9] Toward a comprehensive atlas of the physical interactome of Saccharomyces cerevisiae
    Collins, Sean R.
    Kemmeren, Patrick
    Zhao, Xue-Chu
    Greenblatt, Jack F.
    Spencer, Forrest
    Holstege, Frank C. P.
    Weissman, Jonathan S.
    Krogan, Nevan J.
    [J]. MOLECULAR & CELLULAR PROTEOMICS, 2007, 6 (03) : 439 - 450
  • [10] Measuring semantic similarity between Gene Ontology terms
    Couto, Francisco M.
    Silva, Mario J.
    Coutinho, Pedro M.
    [J]. DATA & KNOWLEDGE ENGINEERING, 2007, 61 (01) : 137 - 152