Validation of Psoriatic Arthritis Diagnoses in Electronic Medical Records Using Natural Language Processing

被引:25
作者
Love, Thorvardur Jon [1 ]
Cai, Tianxi [2 ]
Karlson, Elizabeth W. [1 ]
机构
[1] Harvard Univ, Brigham & Womens Hosp, Sch Med, Boston, MA 02115 USA
[2] Harvard Univ, Sch Publ Hlth, Boston, MA 02115 USA
关键词
psoriatic arthritis; epidemiology; random forests; algorithm; natural language processing; electronic medical record; database; validation; locating; identifying; NLP; POSITIVE PREDICTIVE-VALUE; CLASSIFICATION CRITERIA; RANDOM FORESTS; SENSITIVITY; PREVALENCE; ACCURACY; VALIDITY;
D O I
10.1016/j.semarthrit.2010.05.002
中图分类号
R5 [内科学];
学科分类号
1002 ; 100201 ;
摘要
Objectives: To test whether data extracted from full text patient visit notes from an electronic medical record would improve the classification of psoriatic arthritis (PsA) compared with an algorithm based on codified data. Methods: From the >1,350,000 adults in a large academic electronic medical record, all 2318 patients with a billing code for PsA were extracted and 550 were randomly selected for chart review and algorithm training. Using codified data and phrases extracted from narrative data using natural language processing, 31 predictors were extracted and 3 random forest algorithms were trained using coded, narrative, and combined predictors. The receiver operator curve was used to identify the optimal algorithm and a cut-point was chosen to achieve the maximum sensitivity possible at a 90% positive predictive value (PPV). The algorithm was then used to classify the remaining 1768 charts and finally validated in a random sample of 300 cases predicted to have PsA. Results: The PPV of a single PsA code was 57% (95% CI 55%-58%). Using a combination of coded data and natural language processing (NLP), the random forest algorithm reached a PPV of 90% (95% CI 86%-93%) at a sensitivity of 87% (95% CI 83%-91%) in the training data. The PPV was 93% (95% CI 89%-96%) in the validation set. Adding NLP predictors to codified data increased the area under the receiver operator curve (P < 0.001). Conclusions: Using NLP with text notes from electronic medical records improved the performance of the prediction algorithm significantly. Random forests were a useful tool to accurately classify psoriatic arthritis cases to enable epidemiological research. (C) 2011 Elsevier Inc. All rights reserved. Semin Arthritis Rheum 40:413-420
引用
收藏
页码:413 / 420
页数:8
相关论文
共 50 条
  • [31] Ascertainment of Delirium Status Using Natural Language Processing From Electronic Health Records
    Fu, Sunyang
    Lopes, Guilherme S.
    Pagali, Sandeep R.
    Thorsteinsdottir, Bjoerg
    LeBrasseur, Nathan K.
    Wen, Andrew
    Liu, Hongfang
    Rocca, Walter A.
    Olson, Janet E.
    St Sauver, Jennifer
    Sohn, Sunghwan
    JOURNALS OF GERONTOLOGY SERIES A-BIOLOGICAL SCIENCES AND MEDICAL SCIENCES, 2022, 77 (03): : 524 - 530
  • [32] Using a natural language processing toolkit to classify electronic health records by psychiatric diagnosis
    Hutto, Alissa
    Zikry, Tarek M.
    Bohac, Buck
    Rose, Terra
    Staebler, Jasmine
    Slay, Janet
    Cheever, C. Ray
    Kosorok, Michael R.
    Nash, Rebekah P.
    HEALTH INFORMATICS JOURNAL, 2024, 30 (04)
  • [33] Myocardial infarction and the validation of physician billing and hospitalization data using electronic medical records
    Tu, K.
    Mitiku, T.
    Guo, H.
    Lee, D. S.
    Tu, J. V.
    CHRONIC DISEASES IN CANADA, 2010, 30 (04) : 141 - 146
  • [34] Validation of Case Finding Algorithms for Hepatocellular Cancer From Administrative Data and Electronic Health Records Using Natural Language Processing
    Sada, Yvonne
    Hou, Jason
    Richardson, Peter
    El-Serag, Hashem
    Davila, Jessica
    MEDICAL CARE, 2016, 54 (02) : E9 - E14
  • [35] Annotation methods to develop and evaluate an expert system based on natural language processing in electronic medical records
    Gicquel, Quentin
    Tvardik, Nastassia
    Bouvry, Come
    Kergourlay, Ivan
    Bittar, Andre
    Segond, Frederique
    Darmoni, Stefan
    Metzger, Marie-Helene
    MEDINFO 2015: EHEALTH-ENABLED HEALTH, 2015, 216 : 1067 - 1067
  • [36] Extracting social determinants of health from electronic health records using natural language processing: a systematic review
    Patra, Braja G.
    Sharma, Mohit M.
    Vekaria, Veer
    Adekkanattu, Prakash
    Patterson, Olga, V
    Glicksberg, Benjamin
    Lepow, Lauren A.
    Ryu, Euijung
    Biernacka, Joanna M.
    Furmanchuk, Al'ona
    George, Thomas J.
    Hogan, William
    Wu, Yonghui
    Yang, Xi
    Bian, Jiang
    Weissman, Myrna
    Wickramaratne, Priya
    Mann, J. John
    Olfson, Mark
    Campion, Thomas R., Jr.
    Weiner, Mark
    Pathak, Jyotishman
    JOURNAL OF THE AMERICAN MEDICAL INFORMATICS ASSOCIATION, 2021, 28 (12) : 2716 - 2727
  • [37] Predicting Inpatient Falls Using Natural Language Processing of Nursing Records Obtained From Japanese Electronic Medical Records: Case-Control Study
    Nakatani, Hayao
    Nakao, Masatoshi
    Uchiyama, Hidefumi
    Toyoshiba, Hiroyoshi
    Ochiai, Chikayuki
    JMIR MEDICAL INFORMATICS, 2020, 8 (04)
  • [38] Identification of recurrent atrial fibrillation using natural language processing applied to electronic health records
    Zheng, Chengyi
    Lee, Ming-sum
    Bansal, Nisha
    Go, Alan S.
    Chen, Cheng
    Harrison, Teresa N.
    Fan, Dongjie
    Allen, Amanda
    Garcia, Elisha
    Lidgard, Ben
    Singer, Daniel
    An, Jaejin
    EUROPEAN HEART JOURNAL-QUALITY OF CARE AND CLINICAL OUTCOMES, 2024, 10 (01) : 77 - 88
  • [39] Advancements and gaps in natural language processing and machine learning applications in healthcare: a comprehensive review of electronic medical records and medical imaging
    Khalate, Priyanka
    Gite, Shilpa
    Pradhan, Biswajeet
    Lee, Chang-Wook
    Frontiers in Physics, 2024, 12
  • [40] Automated derivation of diagnostic criteria for lung cancer using natural language processing on electronic health records: a pilot study
    Houston, Andrew
    Williams, Sophie
    Ricketts, William
    Gutteridge, Charles
    Tackaberry, Chris
    Conibear, John
    BMC MEDICAL INFORMATICS AND DECISION MAKING, 2024, 24 (01)