Development and validation of a rheumatoid arthritis case definition: a machine learning approach using data from primary care electronic medical records

被引:0
|
作者
Pham, Anh N. Q. [1 ,2 ,3 ,4 ]
Barber, Claire E. H. [2 ,3 ]
Drummond, Neil [2 ,3 ,5 ]
Jasper, Lisa [6 ]
Klein, Doug [5 ]
Lindeman, Cliff [7 ]
Widdifield, Jessica [8 ,9 ]
Williamson, Tyler [2 ,3 ]
Jones, C. Allyson [6 ]
机构
[1] Simon Fraser Univ, Dept Hlth Sci, Burnaby, BC, Canada
[2] Univ Calgary, Dept Med, Calgary, AB, Canada
[3] Univ Calgary, Dept Community Hlth Sci, Calgary, AB, Canada
[4] Simon Fraser Univ, Pacific Inst Pathogen Pandem & Soc, Burnaby, BC, Canada
[5] Univ Alberta, Dept Family Med, Edmonton, AB, Canada
[6] Univ Alberta, Fac Rehabil Med, Edmonton, AB, Canada
[7] Coll Phys & Surg Alberta, Edmonton, AB, Canada
[8] Sunnybrook Res Inst, Holland Bone & Joint Res Program, Toronto, ON, Canada
[9] Univ Toronto, Inst Hlth Policy Management & Evaluat, ICES, Toronto, ON, Canada
关键词
Rheumatoid arthritis; Case definition; EMR phenotyping; Electronic medical records; Machine learning; SURVEILLANCE;
D O I
10.1186/s12911-024-02776-w
中图分类号
R-058 [];
学科分类号
摘要
BackgroundRheumatoid Arthritis (RA) is a chronic inflammatory disease that is primarily diagnosed and managed by rheumatologists; however, it is often primary care providers who first encounter RA-related symptoms. This study developed and validated a case definition for RA using national surveillance data in primary care settings.MethodsThis cross-sectional validation study used structured electronic medical record (EMR) data from the Canadian Primary Care Sentinel Surveillance Network (CPCSSN). Based on the reference set generated by EMR reviews by five experts, three machine learning steps: 'bag-of-words' approach to feature generation, feature reduction using a feature importance measure coupled with recursive feature elimination and clustering, and classification using tree-based methods (Decision Tree, Random Forest, and Extreme Gradient Boosting). The three tree-based algorithms were compared to identify the procedure that generated the optimal evaluation metrics. Nested cross-validation was used to allow evaluation and comparison and tuning of models simultaneously.ResultsOf 1.3 million patients from seven Canadian provinces, 5,600 people aged 19 + were randomly selected. The optimal algorithm for selecting RA cases was generated by the XGBoost classification method. Based on feature importance scores for features in the XGBoost output, a human-readable case definition was created, where RA cases are identified when there are at least 2 occurrences of text "rheumatoid" in any billing, encounter diagnosis, or health condition table of the patient chart. The final case definition had sensitivity of 81.6% (95% CI, 75.6-86.4), specificity of 98.0% (95% CI, 97.4-98.5), positive predicted value of 76.3% (95% CI, 70.1-81.5), and negative predicted value of 98.6% (95% CI, 98.0-98.6).ConclusionA case definition for RA in using primary care EMR data was developed based off the XGBoost algorithm. With high validity metrics, this case definition is expected to be a reliable tool for future epidemiological research and surveillance investigating the management of RA in CPCSSN dataset.
引用
收藏
页数:9
相关论文
共 50 条
  • [41] A data quality assessment to inform hypertension surveillance using primary care electronic medical record data from Alberta, Canada
    Garies, Stephanie
    McBrien, Kerry
    Quan, Hude
    Manca, Donna
    Drummond, Neil
    Williamson, Tyler
    BMC PUBLIC HEALTH, 2021, 21 (01)
  • [42] A data quality assessment to inform hypertension surveillance using primary care electronic medical record data from Alberta, Canada
    Stephanie Garies
    Kerry McBrien
    Hude Quan
    Donna Manca
    Neil Drummond
    Tyler Williamson
    BMC Public Health, 21
  • [43] Beyond BMI: a feasibility study implementing NutriSTEP in primary care practices using electronic medical records (EMRs)
    Andrade, Lesley
    Moran, Kathy
    Snelling, Susan J.
    Malaviarachchi, Darshaka
    Beyers, Joanne
    Near, Kelsie
    Simpson, Janis Randall
    HEALTH PROMOTION AND CHRONIC DISEASE PREVENTION IN CANADA-RESEARCH POLICY AND PRACTICE, 2020, 40 (01): : 1 - 10
  • [44] Delirium Prediction using Machine Learning Models on Preoperative Electronic Health Records Data
    Davoudi, Anis
    Ebadi, Ashkan
    Rashidi, Parisa
    Ozrazgat-Baslanti, Tazcan
    Bihorac, Azra
    Bursian, Alberto C.
    2017 IEEE 17TH INTERNATIONAL CONFERENCE ON BIOINFORMATICS AND BIOENGINEERING (BIBE), 2017, : 568 - 573
  • [45] Development and validation of a case definition to estimate the prevalence and incidence of cirrhosis in pan-Canadian primary care databases
    Faisal, Nabiha
    Kosowan, Leanne
    Zafari, Hasan
    Zulkernine, Farhana
    Lix, Lisa
    Mahar, Alyson
    Singh, Harminder
    Renner, Eberhard
    Singer, Alexander
    CANADIAN LIVER JOURNAL, 2023, 6 (04): : 375 - 387
  • [46] Reporting of demographic data and representativeness in machine learning models using electronic health records
    Bozkurt, Selen
    Cahan, Eli M.
    Seneviratne, Martin G.
    Sun, Ran
    Lossio-Ventura, Juan A.
    Ioannidis, John P. A.
    Hernandez-Boussard, Tina
    JOURNAL OF THE AMERICAN MEDICAL INFORMATICS ASSOCIATION, 2020, 27 (12) : 1878 - 1884
  • [47] Detecting cardiovascular diseases using unsupervised machine learning clustering based on electronic medical records
    Hu, Ying
    Yan, Hai
    Liu, Ming
    Gao, Jing
    Xie, Lianhong
    Zhang, Chunyu
    Wei, Lili
    Ding, Yinging
    Jiang, Hong
    BMC MEDICAL RESEARCH METHODOLOGY, 2024, 24 (01)
  • [48] MACHINE LEARNING APPROACH FOR CLASSIFICATION OF ARTHRITIS ACTIVITY STATE, USING DATA FROM A SINGLE ACCELEROMETER
    Mielnik, P.
    Hjelle, A. Myhre
    Traseth, A.
    Tokarz, K.
    Pollen, B.
    Fojcik, M.
    ANNALS OF THE RHEUMATIC DISEASES, 2023, 82 : 2012 - 2012
  • [49] Mining Primary Care Electronic Health Records for Automatic Disease Phenotyping: A Transparent Machine Learning Framework
    Fernandez-Gutierrez, Fabiola
    Kennedy, Jonathan I.
    Cooksey, Roxanne
    Atkinson, Mark
    Choy, Ernest
    Brophy, Sinead
    Huo, Lin
    Zhou, Shang-Ming
    DIAGNOSTICS, 2021, 11 (10)
  • [50] Development and validation of a heart failure with preserved ejection fraction cohort using electronic medical records
    Patel, Yash R.
    Robbins, Jeremy M.
    Kurgansky, Katherine E.
    Imran, Tasnim
    Orkaby, Ariela R.
    McLean, Robert R.
    Ho, Yuk-Lam
    Cho, Kelly
    Gaziano, J. Michael
    Djousse, Luc
    Gagnon, David R.
    Joseph, Jacob
    BMC CARDIOVASCULAR DISORDERS, 2018, 18