Validation of Psoriatic Arthritis Diagnoses in Electronic Medical Records Using Natural Language Processing

被引:25
|
作者
Love, Thorvardur Jon [1 ]
Cai, Tianxi [2 ]
Karlson, Elizabeth W. [1 ]
机构
[1] Harvard Univ, Brigham & Womens Hosp, Sch Med, Boston, MA 02115 USA
[2] Harvard Univ, Sch Publ Hlth, Boston, MA 02115 USA
关键词
psoriatic arthritis; epidemiology; random forests; algorithm; natural language processing; electronic medical record; database; validation; locating; identifying; NLP; POSITIVE PREDICTIVE-VALUE; CLASSIFICATION CRITERIA; RANDOM FORESTS; SENSITIVITY; PREVALENCE; ACCURACY; VALIDITY;
D O I
10.1016/j.semarthrit.2010.05.002
中图分类号
R5 [内科学];
学科分类号
1002 ; 100201 ;
摘要
Objectives: To test whether data extracted from full text patient visit notes from an electronic medical record would improve the classification of psoriatic arthritis (PsA) compared with an algorithm based on codified data. Methods: From the >1,350,000 adults in a large academic electronic medical record, all 2318 patients with a billing code for PsA were extracted and 550 were randomly selected for chart review and algorithm training. Using codified data and phrases extracted from narrative data using natural language processing, 31 predictors were extracted and 3 random forest algorithms were trained using coded, narrative, and combined predictors. The receiver operator curve was used to identify the optimal algorithm and a cut-point was chosen to achieve the maximum sensitivity possible at a 90% positive predictive value (PPV). The algorithm was then used to classify the remaining 1768 charts and finally validated in a random sample of 300 cases predicted to have PsA. Results: The PPV of a single PsA code was 57% (95% CI 55%-58%). Using a combination of coded data and natural language processing (NLP), the random forest algorithm reached a PPV of 90% (95% CI 86%-93%) at a sensitivity of 87% (95% CI 83%-91%) in the training data. The PPV was 93% (95% CI 89%-96%) in the validation set. Adding NLP predictors to codified data increased the area under the receiver operator curve (P < 0.001). Conclusions: Using NLP with text notes from electronic medical records improved the performance of the prediction algorithm significantly. Random forests were a useful tool to accurately classify psoriatic arthritis cases to enable epidemiological research. (C) 2011 Elsevier Inc. All rights reserved. Semin Arthritis Rheum 40:413-420
引用
收藏
页码:413 / 420
页数:8
相关论文
共 50 条
  • [1] Can Patients with Dementia Be Identified in Primary Care Electronic Medical Records Using Natural Language Processing?
    Maclagan, Laura C. C.
    Abdalla, Mohamed
    Harris, Daniel A. A.
    Stukel, Therese A. A.
    Chen, Branson
    Candido, Elisa
    Swartz, Richard H. H.
    Iaboni, Andrea
    Jaakkimainen, R. Liisa
    Bronskill, Susan E. E.
    JOURNAL OF HEALTHCARE INFORMATICS RESEARCH, 2023, 7 (01) : 42 - 58
  • [2] Detecting inpatient falls by using natural language processing of electronic medical records
    Toyabe, Shin-ichi
    BMC HEALTH SERVICES RESEARCH, 2012, 12
  • [3] Detecting inpatient falls by using natural language processing of electronic medical records
    Shin-ichi Toyabe
    BMC Health Services Research, 12
  • [4] Development of phenotype algorithms using electronic medical records and incorporating natural language processing
    Liao, Katherine P.
    Cai, Tianxi
    Savova, Guergana K.
    Murphy, Shawn N.
    Karlson, Elizabeth W.
    Ananthakrishnan, Ashwin N.
    Gainer, Vivian S.
    Shaw, Stanley Y.
    Xia, Zongqi
    Szolovits, Peter
    Churchill, Susanne
    Kohane, Isaac
    BMJ-BRITISH MEDICAL JOURNAL, 2015, 350
  • [5] Validation of a Natural Language Processing Algorithm for Detecting Infectious Disease Symptoms in Primary Care Electronic Medical Records in Singapore
    Hardjojo, Antony
    Gunachandran, Arunan
    Pang, Long
    Bin Abdullah, Mohammed Ridzwan
    Wah, Win
    Chong, Joash Wen Chen
    Goh, Ee Hui
    Teo, Sok Huang
    Lim, Gilbert
    Lee, Mong Li
    Hsu, Wynne
    Lee, Vernon
    Chen, Mark I-Cheng
    Wong, Franco
    Phang, Jonathan Siung King
    JMIR MEDICAL INFORMATICS, 2018, 6 (02) : 45 - 59
  • [6] Natural Language Processing to Identify Cancer Treatments With Electronic Medical Records
    Zeng, Jiaming
    Banerjee, Imon
    Henry, A. Solomon
    Wood, Douglas J.
    Shachter, Ross D.
    Gensheimer, Michael F.
    Rubin, Daniel L.
    JCO CLINICAL CANCER INFORMATICS, 2021, 5 : 379 - 393
  • [7] CliniViewer: A tool for viewing electronic medical records based on natural language processing and XML
    Liu, Hongfang
    Friedman, Carol
    Studies in Health Technology and Informatics, 2004, 107 : 639 - 643
  • [8] Incorporating natural language processing to improve classification of axial spondyloarthritis using electronic health records
    Zhao, Sizheng Steven
    Hong, Chuan
    Cai, Tianrun
    Xu, Chang
    Huang, Jie
    Ermann, Joerg
    Goodson, Nicola J.
    Solomon, Daniel H.
    Cai, Tianxi
    Liao, Katherine P.
    RHEUMATOLOGY, 2020, 59 (05) : 1059 - 1065
  • [9] Natural Language Processing to Improve Prediction of Incident Atrial Fibrillation Using Electronic Health Records
    Ashburner, Jeffrey M.
    Chang, Yuchiao
    Wang, Xin
    Khurshid, Shaan
    Anderson, Christopher D.
    Dahal, Kumar
    Weisenfeld, Dana
    Cai, Tianrun
    Liao, Katherine P.
    Wagholikar, Kavishwar B.
    Murphy, Shawn N.
    Atlas, Steven J.
    Lubitz, Steven A.
    Singer, Daniel E.
    JOURNAL OF THE AMERICAN HEART ASSOCIATION, 2022, 11 (15):
  • [10] Using the Natural Language Processing System Medical Named Entity Recognition-Japanese to Analyze Pharmaceutical Care Records:Natural Language Processing Analysis
    Ohno, Yukiko
    Kato, Riri
    Ishikawa, Haruki
    Nishiyama, Tomohiro
    Isawa, Minae
    Mochizuki, Mayumi
    Aramaki, Eiji
    Aomori, Tohru
    JMIR FORMATIVE RESEARCH, 2024, 8