From disease ontology to disease-ontology lite: statistical methods to adapt a general-purpose ontology for the test of gene-ontology associations

被引:81
作者
Du, Pan [1 ]
Feng, Gang [1 ]
Flatow, Jared [1 ]
Song, Jie [2 ]
Holko, Michelle [3 ]
Kibbe, Warren A. [1 ]
Lin, Simon M. [1 ]
机构
[1] Northwestern Univ, Biomed Informat Ctr, Chicago, IL 60611 USA
[2] Univ Chicago, Dept Pathol, Chicago, IL 60637 USA
[3] Northwestern Univ, Dept Prevent Med, Chicago, IL 60611 USA
关键词
SEMANTIC SIMILARITY; DATABASE;
D O I
10.1093/bioinformatics/btp193
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Subjective methods have been reported to adapt a general-purpose ontology for a specific application. For example, Gene Ontology (GO) Slim was created from GO to generate a highly aggregated report of the human-genome annotation. We propose statistical methods to adapt the general purpose, OBO Foundry Disease Ontology (DO) for the identification of gene-disease associations. Thus, we need a simplified definition of disease categories derived from implicated genes. On the basis of the assumption that the DO terms having similar associated genes are closely related, we group the DO terms based on the similarity of gene-to-DO mapping profiles. Two types of binary distance metrics are defined to measure the overall and subset similarity between DO terms. A compactness-scalable fuzzy clustering method is then applied to group similar DO terms. To reduce false clustering, the semantic similarities between DO terms are also used to constrain clustering results. As such, the DO terms are aggregated and the redundant DO terms are largely removed. Using these methods, we constructed a simplified vocabulary list from the DO called Disease Ontology Lite (DOLite). We demonstrated that DOLite results in more interpretable results than DO for gene-disease association tests. The resultant DOLite has been used in the Functional Disease Ontology (FunDO) Web application at http://www.projects.bioinformatics.northwestern.edu/fundo.
引用
收藏
页码:I63 / I68
页数:6
相关论文
共 17 条
[1]   The genome sequence of Drosophila melanogaster [J].
Adams, MD ;
Celniker, SE ;
Holt, RA ;
Evans, CA ;
Gocayne, JD ;
Amanatides, PG ;
Scherer, SE ;
Li, PW ;
Hoskins, RA ;
Galle, RF ;
George, RA ;
Lewis, SE ;
Richards, S ;
Ashburner, M ;
Henderson, SN ;
Sutton, GG ;
Wortman, JR ;
Yandell, MD ;
Zhang, Q ;
Chen, LX ;
Brandon, RC ;
Rogers, YHC ;
Blazej, RG ;
Champe, M ;
Pfeiffer, BD ;
Wan, KH ;
Doyle, C ;
Baxter, EG ;
Helt, G ;
Nelson, CR ;
Miklos, GLG ;
Abril, JF ;
Agbayani, A ;
An, HJ ;
Andrews-Pfannkoch, C ;
Baldwin, D ;
Ballew, RM ;
Basu, A ;
Baxendale, J ;
Bayraktaroglu, L ;
Beasley, EM ;
Beeson, KY ;
Benos, PV ;
Berman, BP ;
Bhandari, D ;
Bolshakov, S ;
Borkova, D ;
Botchan, MR ;
Bouck, J ;
Brokstein, P .
SCIENCE, 2000, 287 (5461) :2185-2195
[2]   GO PaD: The gene ontology partition database [J].
Alterovitz, Gil ;
Xiang, Michael ;
Mohan, Mamta ;
Ramoni, Marco F. .
NUCLEIC ACIDS RESEARCH, 2007, 35 :D322-D327
[3]   ProfCom: a web tool for profiling the complex functionality of gene groups identified from high-throughput data [J].
Antonov, Alexey V. ;
Schmidt, Thorsten ;
Wang, Yu ;
Mewes, Hans W. .
NUCLEIC ACIDS RESEARCH, 2008, 36 :W347-W351
[4]   DAVID: Database for annotation, visualization, and integrated discovery [J].
Dennis, G ;
Sherman, BT ;
Hosack, DA ;
Yang, J ;
Gao, W ;
Lane, HC ;
Lempicki, RA .
GENOME BIOLOGY, 2003, 4 (09)
[5]   Functional interpretation of microarray experiments [J].
Dopazo, Joaquin .
OMICS-A JOURNAL OF INTEGRATIVE BIOLOGY, 2006, 10 (03) :398-410
[6]   Modeling gene expression networks using fuzzy logic [J].
Du, P ;
Gong, H ;
Wurtele, ES ;
Dickerson, JA .
IEEE TRANSACTIONS ON SYSTEMS MAN AND CYBERNETICS PART B-CYBERNETICS, 2005, 35 (06) :1351-1359
[7]   Using GOstats to test gene lists for GO term association [J].
Falcon, S. ;
Gentleman, R. .
BIOINFORMATICS, 2007, 23 (02) :257-258
[8]   Multiple testing on the directed acyclic graph of gene ontology [J].
Goeman, Jelle J. ;
Mansmann, Ulrich .
BIOINFORMATICS, 2008, 24 (04) :537-544
[9]   The Gene Ontology (GO) database and informatics resource [J].
Harris, MA ;
Clark, J ;
Ireland, A ;
Lomax, J ;
Ashburner, M ;
Foulger, R ;
Eilbeck, K ;
Lewis, S ;
Marshall, B ;
Mungall, C ;
Richter, J ;
Rubin, GM ;
Blake, JA ;
Bult, C ;
Dolan, M ;
Drabkin, H ;
Eppig, JT ;
Hill, DP ;
Ni, L ;
Ringwald, M ;
Balakrishnan, R ;
Cherry, JM ;
Christie, KR ;
Costanzo, MC ;
Dwight, SS ;
Engel, S ;
Fisk, DG ;
Hirschman, JE ;
Hong, EL ;
Nash, RS ;
Sethuraman, A ;
Theesfeld, CL ;
Botstein, D ;
Dolinski, K ;
Feierbach, B ;
Berardini, T ;
Mundodi, S ;
Rhee, SY ;
Apweiler, R ;
Barrell, D ;
Camon, E ;
Dimmer, E ;
Lee, V ;
Chisholm, R ;
Gaudet, P ;
Kibbe, W ;
Kishore, R ;
Schwarz, EM ;
Sternberg, P ;
Gwinn, M .
NUCLEIC ACIDS RESEARCH, 2004, 32 :D258-D261
[10]  
JIANG JJ, 1997, P ROCLING 10 AC SIN