Learning a Health Knowledge Graph from Electronic Medical Records

被引:282
作者
Rotmensch, Maya [1 ]
Halpern, Yoni [2 ]
Tlimat, Abdulhakim [3 ]
Horng, Steven [3 ,4 ]
Sontag, David [5 ,6 ]
机构
[1] NYU, Ctr Data Sci, New York, NY USA
[2] NYU, Dept Comp Sci, New York, NY USA
[3] Beth Israel Deaconess Med Ctr, Dept Emergency Med, Boston, MA 02215 USA
[4] Beth Israel Deaconess Med Ctr, Div Clin Informat, Boston, MA 02215 USA
[5] MIT, Dept Elect Engn & Comp Sci, Comp Sci & Artificial Intelligence Lab, 77 Massachusetts Ave, Cambridge, MA 02139 USA
[6] MIT, Inst Med Engn & Sci, 77 Massachusetts Ave, Cambridge, MA 02139 USA
关键词
DIAGNOSIS; PROGRAM;
D O I
10.1038/s41598-017-05778-z
中图分类号
O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];
学科分类号
07 ; 0710 ; 09 ;
摘要
Demand for clinical decision support systems in medicine and self-diagnostic symptom checkers has substantially increased in recent years. Existing platforms rely on knowledge bases manually compiled through a labor-intensive process or automatically derived using simple pairwise statistics. This study explored an automated process to learn high quality knowledge bases linking diseases and symptoms directly from electronic medical records. Medical concepts were extracted from 273,174 de-identified patient records and maximum likelihood estimation of three probabilistic models was used to automatically construct knowledge graphs: logistic regression, naive Bayes classifier and a Bayesian network using noisy OR gates. A graph of disease-symptom relationships was elicited from the learned parameters and the constructed knowledge graphs were evaluated and validated, with permission, against Google's manually-constructed knowledge graph and against expert physician opinions. Our study shows that direct and automated construction of high quality health knowledge graphs from medical records using rudimentary concept extraction is feasible. The noisy OR model produces a high quality knowledge graph reaching precision of 0.85 for a recall of 0.6 in the clinical evaluation. Noisy OR significantly outperforms all tested models across evaluation frameworks (p < 0.01).
引用
收藏
页数:11
相关论文
共 40 条
[1]  
Anand V., 2008, AMIA ANN S P
[2]  
[Anonymous], 2013, Adoption of electronic health record systems among U.S. Non-federal acute care hospitals: 2008-2012
[3]  
Arora K., 2016, GADGETS NOW
[4]   DXPLAIN - AN EVOLVING DIAGNOSTIC DECISION-SUPPORT SYSTEM [J].
BARNETT, GO ;
CIMINO, JJ ;
HUPP, JA ;
HOFFER, EP .
JAMA-JOURNAL OF THE AMERICAN MEDICAL ASSOCIATION, 1987, 258 (01) :67-74
[5]  
Bisson L. J., 2014, AM J SPORTS MED
[6]   A simple algorithm for identifying negated findings and diseases in discharge summaries [J].
Chapman, WW ;
Bridewell, W ;
Hanbury, P ;
Cooper, GF ;
Buchanan, BG .
JOURNAL OF BIOMEDICAL INFORMATICS, 2001, 34 (05) :301-310
[7]   COMPUTER-AIDED DIAGNOSIS OF ACUTE ABDOMINAL PAIN [J].
DEDOMBAL, FT ;
MCCANN, AP ;
LEAPER, DJ ;
STANILAND, JR ;
HORROCKS, JC .
BMJ-BRITISH MEDICAL JOURNAL, 1972, 2 (5804) :9-+
[8]  
Ferrucci D., 2011, RC25244 IBM
[9]   Building the graph of medicine from millions of clinical narratives [J].
Finlayson, Samuel G. ;
LePendu, Paea ;
Shah, Nigam H. .
SCIENTIFIC DATA, 2014, 1
[10]   Extracting information from the text of electronic medical records to improve case detection: a systematic review [J].
Ford, Elizabeth ;
Carroll, John A. ;
Smith, Helen E. ;
Scott, Donia ;
Cassell, Jackie A. .
JOURNAL OF THE AMERICAN MEDICAL INFORMATICS ASSOCIATION, 2016, 23 (05) :1007-1015