A new method to measure the semantic similarity from query phenotypic abnormalities to diseases based on the human phenotype ontology

被引:12
作者
Gong, Xiaofeng [1 ]
Jiang, Jianping [1 ]
Duan, Zhongqu [1 ]
Lu, Hui [1 ]
机构
[1] Shanghai Jiao Tong Univ, Dept Bioinformat & Biostat, SJTU Yale Joint Ctr Biostat, Shanghai, Peoples R China
关键词
Human phenotype ontology (HPO); Semantic similarity; Disease; Diagnosis; VARIANT PRIORITIZATION; GENE DISCOVERY; DIAGNOSTICS; TOOL;
D O I
10.1186/s12859-018-2064-y
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Background: Although rapid developed sequencing technologies make it possible for genotype data to be used in clinical diagnosis, it is still challenging for clinicians to understand the results of sequencing and make correct judgement based on them. Before this, diagnosis based on clinical features held a leading position. With the establishment of the Human Phenotype Ontology (HPO) and the enrichment of phenotype-disease annotations, there throws much more attention to the improvement of phenotype-based diagnosis. Results: In this study, we presented a novel method called RelativeBestPair to measure similarity from the query terms to hereditary diseases based on HPO and then rank the candidate diseases. To evaluate the performance, we simulated a set of patients based on 44 complex diseases. Besides, by adding noise or imprecision or both, cases closer to real clinical conditions were generated. Thus, four simulated datasets were used to make comparison among RelativeBestPair and seven existing semantic similarity measures. RelativeBestPair ranked the underlying disease as top 1 on 93.73% of the simulated dataset without noise and imprecision, 93.64% of the simulated dataset with noise and without imprecision, 39.82% of the simulated dataset without noise and with imprecision, and 33.64% of the simulated dataset with both noise and imprecision. Conclusion: Compared with the seven existing semantic similarity measures, RelativeBestPair showed similar performance in two datasets without imprecision. While RelativeBestPair appeared to be equal to Resnik and better than other six methods in the simulated dataset without noise and with imprecision, it significantly outperformed all other seven methods in the simulated dataset with both noise and imprecision. It can be indicated that RelativeBestPair might be of great help in clinical setting.
引用
收藏
页数:9
相关论文
共 26 条
[1]  
[Anonymous], 1995, arXiv
[2]  
[Anonymous], 2007, P 10 ANN BIOONT M
[3]  
Ayme Segolene, 2003, Soins, P46
[4]   HPOSim: An R Package for Phenotypic Similarity Measure and Enrichment Analysis Based on the Human Phenotype Ontology [J].
Deng, Yue ;
Gao, Lin ;
Wang, Bingbo ;
Guo, Xingli .
PLOS ONE, 2015, 10 (02)
[5]   PhenoTips: Patient Phenotyping Software for Clinical and Research Use [J].
Girdea, Marta ;
Dumitriu, Sergiu ;
Fiume, Marc ;
Bowdin, Sarah ;
Boycott, Kym M. ;
Chenier, Sebastien ;
Chitayat, David ;
Faghfoury, Hanna ;
Meyn, M. Stephen ;
Ray, Peter N. ;
So, Joyce ;
Stavropoulos, Dimitri J. ;
Brudno, Michael .
HUMAN MUTATION, 2013, 34 (08) :1057-1065
[6]  
Hamosh A, 2005, NUCLEIC ACIDS RES, V33, pD514
[7]   PhenoDB: A New Web-Based Tool for the Collection, Storage, and Analysis of Phenotypic Features [J].
Hamosh, Ada ;
Sobreira, Nara ;
Hoover-Fong, Julie ;
Sutton, V. Reid ;
Boehm, Corinne ;
Schiettecatte, Francois ;
Valle, David .
HUMAN MUTATION, 2013, 34 (04) :566-571
[8]   A visual and curatorial approach to clinical variant prioritization and disease gene discovery in genome-wide diagnostics [J].
James, Regis A. ;
Campbell, Ian M. ;
Chen, Edward S. ;
Boone, Philip M. ;
Rao, Mitchell A. ;
Bainbridge, Matthew N. ;
Lupski, James R. ;
Yang, Yaping ;
Eng, Christine M. ;
Posey, Jennifer E. ;
Shaw, Chad A. .
GENOME MEDICINE, 2016, 8
[9]  
Javed A, 2014, NAT METHODS, V11, P935, DOI [10.1038/NMETH.3046, 10.1038/nmeth.3046]
[10]  
Jiang JJ, 1997, P 10TH INT C RES COM