Text-mining approach to evaluate terms for ontology development

被引:16
作者
Tsoi, Lam C. [1 ]
Patel, Ravi [1 ]
Zhao, Wenle [1 ]
Zheng, W. Jim [1 ]
机构
[1] Med Univ S Carolina, Dept Biostat Bioinformat & Epidemiol, Bioinformat Grad Program, Charleston, SC 29464 USA
关键词
Ontology development; Hypergeometric test; PubMed; Text mining; GENE-ONTOLOGY; MICROARRAY DATA; INFORMATION; ANNOTATION; SOFTWARE; DOMAIN; TOOL; GO;
D O I
10.1016/j.jbi.2009.03.009
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
Developing ontologies to account for the complexity of biological systems requires the time intensive collaboration of many participants with expertise in various fields. While each participant may contribute to construct a list of terms for ontology development, no objective methods have been developed to evaluate how relevant each of these terms is to the intended domain. We have developed a computational method based on a hypergeometric enrichment test to evaluate the relevance of such terms to the intended domain. The proposed method uses the PubMed literature database to evaluate whether each potential term for ontology development is overrepresented in the abstracts that discuss the particular domain. This evaluation provides an objective approach to assess terms and prioritize them for ontology development. (C) 2009 Elsevier Inc. All rights reserved.
引用
收藏
页码:824 / 830
页数:7
相关论文
共 34 条
  • [1] Agresti A., 2013, Categorical data analysis, V341, P384
  • [2] Ahmad K., 2001, HDB TERMINOLOGY MANA, V2, P725, DOI DOI 10.1075/Z.HTM2.28AHM
  • [3] Targeted cellular process profiling approach for uterine leiomyoma using cDNA microarray, proteomics and gene ontology analysis
    Ahn, WS
    Kim, KW
    Bae, SM
    Yoon, JH
    Lee, JM
    Namkoong, SE
    Kim, JH
    Kim, CK
    Lee, YJ
    Kim, YW
    [J]. INTERNATIONAL JOURNAL OF EXPERIMENTAL PATHOLOGY, 2003, 84 (06) : 267 - 279
  • [4] FatiGO:: a web tool for finding significant associations of Gene Ontology terms with groups of genes
    Al-Shahrour, F
    Díaz-Uriarte, R
    Dopazo, J
    [J]. BIOINFORMATICS, 2004, 20 (04) : 578 - 580
  • [5] Terminologies for text-mining;: an experiment in the lipoprotein metabolism domain
    Alexopoulou, Dimitra
    Waechter, Thomas
    Pickersgill, Laura
    Eyre, Cecilia
    Schroeder, Michael
    [J]. BMC BIOINFORMATICS, 2008, 9 (Suppl 4)
  • [6] [Anonymous], INT J DIGITAL LIB
  • [7] [Anonymous], 1996, The Balancing Act: Combining Symbolic and Statistical Approaches to Language
  • [8] [Anonymous], 2007, R LANG ENV STAT COMP
  • [9] Gene Ontology: tool for the unification of biology
    Ashburner, M
    Ball, CA
    Blake, JA
    Botstein, D
    Butler, H
    Cherry, JM
    Davis, AP
    Dolinski, K
    Dwight, SS
    Eppig, JT
    Harris, MA
    Hill, DP
    Issel-Tarver, L
    Kasarskis, A
    Lewis, S
    Matese, JC
    Richardson, JE
    Ringwald, M
    Rubin, GM
    Sherlock, G
    [J]. NATURE GENETICS, 2000, 25 (01) : 25 - 29
  • [10] Badea Liviu, 2003, Pac Symp Biocomput, P565