Text Mining of Electronic Health Records Can Accurately Identify and Characterize Patients With Systemic Lupus Erythematosus

被引:12
作者
Brunekreef, Tammo E. [1 ]
Otten, Henny G. [1 ]
van den Bosch, Suzanne C. [1 ]
Hoefer, Imo E. [1 ]
van Laar, Jacob M. [1 ]
Limper, Maarten [1 ]
Haitjema, Saskia [1 ]
机构
[1] Univ Utrecht, Univ Med Ctr Utrecht, Utrecht, Netherlands
关键词
INFORMATION;
D O I
10.1002/acr2.11211
中图分类号
R5 [内科学];
学科分类号
1002 ; 100201 ;
摘要
ObjectiveElectronic health records (EHR) are increasingly being recognized as a major source of data reusable for medical research and quality monitoring, although patient identification and assessment of symptoms (characterization) remain challenging, especially in complex diseases such as systemic lupus erythematosus (SLE). Current coding systems are unable to assess information recorded in the physician's free-text notes. This study shows that text mining can be used as a reliable alternative. MethodsIn a multidisciplinary research team of data scientists and medical experts, a text mining algorithm on 4607 patient records was developed to assess the diagnosis of 14 different immune-mediated inflammatory diseases and the presence of 18 different symptoms in the EHR. The text mining algorithm included key words in the EHR, while mining the context for exclusion phrases. The accuracy of the text mining algorithm was assessed by manually checking the EHR of 100 random patients suspected of having SLE for diagnoses and symptoms and comparing the outcome with the outcome of the text mining algorithm. ResultsAfter evaluation of 100 patient records, the text mining algorithm had a sensitivity of 96.4% and a specificity of 93.3% in assessing the presence of SLE. The algorithm detected potentially life-threatening symptoms (nephritis, pleuritis) with good sensitivity (80%-82%) and high specificity (97%-97%). ConclusionWe present a text mining algorithm that can accurately identify and characterize patients with SLE using routinely collected data from the EHR. Our study shows that using text mining, data from the EHR can be reused in research and quality control.
引用
收藏
页码:65 / 71
页数:7
相关论文
共 21 条
  • [1] Developing Electronic Health Record Algorithms That Accurately Identify Patients With Systemic Lupus Erythematosus
    Barnado, April
    Casey, Carolyn
    Carroll, Robert J.
    Wheless, Lee
    Denny, Joshua C.
    Crofford, Leslie J.
    [J]. ARTHRITIS CARE & RESEARCH, 2017, 69 (05) : 687 - 693
  • [2] Diagnostic criteria for systemic lupus erythematosus: has the time come?
    Bertsias, George K.
    Pamfil, Cristina
    Fanouriakis, Antonios
    Boumpas, Dimitrios T.
    [J]. NATURE REVIEWS RHEUMATOLOGY, 2013, 9 (11) : 687 - 694
  • [3] A Human(e) Factor in Clinical Decision Support Systems
    Bezemer, Tim
    de Groot, Mark C. H.
    Blasse, Enja
    ten Berg, Maarten J.
    Kappen, Teus H.
    Bredenoord, Annelien L.
    van Solinge, Wouter W.
    Hoefer, Imo E.
    Haitjema, Saskia
    [J]. JOURNAL OF MEDICAL INTERNET RESEARCH, 2019, 21 (03)
  • [4] Extracting information from the text of electronic medical records to improve case detection: a systematic review
    Ford, Elizabeth
    Carroll, John A.
    Smith, Helen E.
    Scott, Donia
    Cassell, Jackie A.
    [J]. JOURNAL OF THE AMERICAN MEDICAL INFORMATICS ASSOCIATION, 2016, 23 (05) : 1007 - 1015
  • [5] Optimising the use of electronic health records to estimate the incidence of rheumatoid arthritis in primary care: what information is hidden in free text?
    Ford, Elizabeth
    Nicholson, Amanda
    Koeling, Rob
    Tate, A. Rosemary
    Carroll, John
    Axelrod, Lesley
    Smith, Helen E.
    Rait, Greta
    Davies, Kevin A.
    Petersen, Irene
    Williams, Tim
    Cassell, Jackie A.
    [J]. BMC MEDICAL RESEARCH METHODOLOGY, 2013, 13
  • [6] Gladman DD, 2002, J RHEUMATOL, V29, P288
  • [7] Data mining information from electronic health records produced high yield and accuracy for current smoking status
    Groenhof, T. Katrien J.
    Koers, Laurien R.
    Blasse, Enja
    de Groot, Mark
    Grobbee, Diederick E.
    Bots, Michiel L.
    Asselbergs, Folkert W.
    Lely, A. Titia
    Haitjema, Saskia
    van Solinge, Wouter
    Hoefer, Imo
    Haitjema, Saskia
    de Groot, Mark
    Asselbergs, F. W.
    de Borst, G. J.
    Bots, M. L.
    Dieleman, S.
    Emmelot, M. H.
    de Jong, P. A.
    Lely, A. T.
    Hoefer, I. E.
    van der Kaaij, N. P.
    Ruigrok, Y. M.
    Verhaar, M. C.
    Visseren, F. L. J.
    [J]. JOURNAL OF CLINICAL EPIDEMIOLOGY, 2020, 118 : 100 - 106
  • [8] Measurement of patient safety: a systematic review of the reliability and validity of adverse event detection with record review
    Hanskamp-Sebregts, Mirelle
    Zegers, Marieke
    Vincent, Charles
    van Gurp, Petra J.
    de Vet, Henrica C. W.
    Wollersheim, Hub
    [J]. BMJ OPEN, 2016, 6 (08):
  • [9] Coronary artery disease risk assessment from unstructured electronic health records using text mining
    Jonnagaddala, Jitendra
    Liaw, Siaw-Teng
    Ray, Pradeep
    Kumar, Manish
    Chang, Nai-Wen
    Dai, Hong-Jie
    [J]. JOURNAL OF BIOMEDICAL INFORMATICS, 2015, 58 : S203 - S210
  • [10] Text Mining of the Electronic Health Record: An Information Extraction Approach for Automated Identification and Subphenotyping of HFpEF Patients for Clinical Trials
    Jonnalagadda, Siddhartha R.
    Adupa, Abhishek K.
    Garg, Ravi P.
    Corona-Cox, Jessica
    Shah, Sanjiv J.
    [J]. JOURNAL OF CARDIOVASCULAR TRANSLATIONAL RESEARCH, 2017, 10 (03) : 313 - 321