Validation of Psoriatic Arthritis Diagnoses in Electronic Medical Records Using Natural Language Processing

被引：25

作者：

Love, Thorvardur Jon ^{[1
]}

Cai, Tianxi ^{[2
]}

Karlson, Elizabeth W. ^{[1
]}

机构：

[1] Harvard Univ, Brigham & Womens Hosp, Sch Med, Boston, MA 02115 USA

[2] Harvard Univ, Sch Publ Hlth, Boston, MA 02115 USA

来源：

SEMINARS IN ARTHRITIS AND RHEUMATISM | 2011年 / 40卷 / 05期

关键词：

psoriatic arthritis; epidemiology; random forests; algorithm; natural language processing; electronic medical record; database; validation; locating; identifying; NLP; POSITIVE PREDICTIVE-VALUE; CLASSIFICATION CRITERIA; RANDOM FORESTS; SENSITIVITY; PREVALENCE; ACCURACY; VALIDITY;

D O I：

10.1016/j.semarthrit.2010.05.002

中图分类号：

R5 [内科学];

学科分类号：

1002 ; 100201 ;

摘要：

Objectives: To test whether data extracted from full text patient visit notes from an electronic medical record would improve the classification of psoriatic arthritis (PsA) compared with an algorithm based on codified data. Methods: From the >1,350,000 adults in a large academic electronic medical record, all 2318 patients with a billing code for PsA were extracted and 550 were randomly selected for chart review and algorithm training. Using codified data and phrases extracted from narrative data using natural language processing, 31 predictors were extracted and 3 random forest algorithms were trained using coded, narrative, and combined predictors. The receiver operator curve was used to identify the optimal algorithm and a cut-point was chosen to achieve the maximum sensitivity possible at a 90% positive predictive value (PPV). The algorithm was then used to classify the remaining 1768 charts and finally validated in a random sample of 300 cases predicted to have PsA. Results: The PPV of a single PsA code was 57% (95% CI 55%-58%). Using a combination of coded data and natural language processing (NLP), the random forest algorithm reached a PPV of 90% (95% CI 86%-93%) at a sensitivity of 87% (95% CI 83%-91%) in the training data. The PPV was 93% (95% CI 89%-96%) in the validation set. Adding NLP predictors to codified data increased the area under the receiver operator curve (P < 0.001). Conclusions: Using NLP with text notes from electronic medical records improved the performance of the prediction algorithm significantly. Random forests were a useful tool to accurately classify psoriatic arthritis cases to enable epidemiological research. (C) 2011 Elsevier Inc. All rights reserved. Semin Arthritis Rheum 40:413-420

引用

页码：413 / 420

页数：8

共 50 条

[21] Using Natural Language Processing on Electronic Health Records to Enhance Detection and Prediction of Psychosis Risk
Irving, Jessica
Patel, Rashmi
Oliver, Dominic
Colling, Craig
Pritchard, Megan
Broadbent, Matthew
Baldwin, Helen
Stahl, Daniel
Stewart, Robert
Fusar-Poli, Paolo
SCHIZOPHRENIA BULLETIN, 2021, 47 (02) : 405 - 414
[22] Efficient Queries of Stand-off Annotations for Natural Language Processing on Electronic Medical Records
Luo, Yuan
Szolovits, Peter
BIOMEDICAL INFORMATICS INSIGHTS, 2016, 8
[23] Natural language processing of electronic medical records identifies cardioprotective agents for anthracycline induced cardiotoxicity
Kawazoe, Yoshimasa
Tsuchiya, Masami
Shimamoto, Kiminori
Seki, Tomohisa
Shinohara, Emiko
Yada, Shuntaro
Wakamiya, Shoko
Imai, Shungo
Aramaki, Eiji
Hori, Satoko
SCIENTIFIC REPORTS, 2025, 15 (01):
[24] Electronic Medical Record Data Mining and Processing Based on Natural Language Processing
Zhang, Shichen
PROCEEDINGS OF 2024 INTERNATIONAL CONFERENCE ON MACHINE INTELLIGENCE AND DIGITAL APPLICATIONS, MIDA2024, 2024, : 212 - 217
[25] Natural Language Processing and Machine Learning for Identifying Incident Stroke From Electronic Health Records: Algorithm Development and Validation
Zhao, Yiqing
Fu, Sunyang
Bielinski, Suzette J.
Decker, Paul A.
Chamberlain, Alanna M.
Roger, Veronique L.
Liu, Hongfang
Larson, Nicholas B.
JOURNAL OF MEDICAL INTERNET RESEARCH, 2021, 23 (03)
[26] Natural language processing to identify lupus nephritis phenotype in electronic health records
Deng, Yu
Pacheco, Jennifer A.
Ghosh, Anika
Chung, Anh
Mao, Chengsheng
Smith, Joshua C.
Zhao, Juan
Wei, Wei-Qi
Barnado, April
Dorn, Chad
Weng, Chunhua
Liu, Cong
Cordon, Adam
Yu, Jingzhi
Tedla, Yacob
Kho, Abel
Ramsey-Goldman, Rosalind
Walunas, Theresa
Luo, Yuan
BMC MEDICAL INFORMATICS AND DECISION MAKING, 2024, 22 (SUPPL 2)
[27] Automated detection of sudden unexpected death in epilepsy risk factors in electronic medical records using natural language processing
Barbour, Kristen
Hesdorffer, Dale C.
Tian, Niu
Yozawitz, Elissa G.
McGoldrick, Patricia E.
Wolf, Steven
McDonough, Tiffani L.
Nelson, Aaron
Loddenkemper, Tobias
Basma, Natasha
Johnson, Stephen B.
Grinspan, Zachary M.
EPILEPSIA, 2019, 60 (06) : 1209 - 1220
[28] Applying Natural Language Processing Toolkits to Electronic Health Records - An Experience Report
Barrett, Neil
Weber-Jahnke, Jens H.
ADVANCES IN INFORMATION TECHNOLOGY AND COMMUNICATION IN HEALTH, 2009, 143 : 441 - 446
[29] Using Natural Language Processing and Machine Learning to Preoperatively Predict Lymph Node Metastasis for Non-Small Cell Lung Cancer With Electronic Medical Records: Development and Validation Study
Hu, Danqing
Li, Shaolei
Zhang, Huanyao
Wu, Nan
Lu, Xudong
JMIR MEDICAL INFORMATICS, 2022, 10 (04) : 153 - 170
[30] Improving Case Definition of Crohn's Disease and Ulcerative Colitis in Electronic Medical Records Using Natural Language Processing: A Novel Informatics Approach
Ananthakrishnan, Ashwin N.
Cai, Tianxi
Savova, Guergana
Cheng, Su-Chun
Chen, Pei
Perez, Raul Guzman
Gainer, Vivian S.
Murphy, Shawn N.
Szolovits, Peter
Xia, Zongqi
Shaw, Stanley
Churchill, Susanne
Karlson, Elizabeth W.
Kohane, Isaac
Plenge, Robert M.
Liao, Katherine P.
INFLAMMATORY BOWEL DISEASES, 2013, 19 (07) : 1411 - 1420

← 1 2 3 4 5 →