Development and validation of a rheumatoid arthritis case definition: a machine learning approach using data from primary care electronic medical records

被引:0
|
作者
Pham, Anh N. Q. [1 ,2 ,3 ,4 ]
Barber, Claire E. H. [2 ,3 ]
Drummond, Neil [2 ,3 ,5 ]
Jasper, Lisa [6 ]
Klein, Doug [5 ]
Lindeman, Cliff [7 ]
Widdifield, Jessica [8 ,9 ]
Williamson, Tyler [2 ,3 ]
Jones, C. Allyson [6 ]
机构
[1] Simon Fraser Univ, Dept Hlth Sci, Burnaby, BC, Canada
[2] Univ Calgary, Dept Med, Calgary, AB, Canada
[3] Univ Calgary, Dept Community Hlth Sci, Calgary, AB, Canada
[4] Simon Fraser Univ, Pacific Inst Pathogen Pandem & Soc, Burnaby, BC, Canada
[5] Univ Alberta, Dept Family Med, Edmonton, AB, Canada
[6] Univ Alberta, Fac Rehabil Med, Edmonton, AB, Canada
[7] Coll Phys & Surg Alberta, Edmonton, AB, Canada
[8] Sunnybrook Res Inst, Holland Bone & Joint Res Program, Toronto, ON, Canada
[9] Univ Toronto, Inst Hlth Policy Management & Evaluat, ICES, Toronto, ON, Canada
关键词
Rheumatoid arthritis; Case definition; EMR phenotyping; Electronic medical records; Machine learning; SURVEILLANCE;
D O I
10.1186/s12911-024-02776-w
中图分类号
R-058 [];
学科分类号
摘要
BackgroundRheumatoid Arthritis (RA) is a chronic inflammatory disease that is primarily diagnosed and managed by rheumatologists; however, it is often primary care providers who first encounter RA-related symptoms. This study developed and validated a case definition for RA using national surveillance data in primary care settings.MethodsThis cross-sectional validation study used structured electronic medical record (EMR) data from the Canadian Primary Care Sentinel Surveillance Network (CPCSSN). Based on the reference set generated by EMR reviews by five experts, three machine learning steps: 'bag-of-words' approach to feature generation, feature reduction using a feature importance measure coupled with recursive feature elimination and clustering, and classification using tree-based methods (Decision Tree, Random Forest, and Extreme Gradient Boosting). The three tree-based algorithms were compared to identify the procedure that generated the optimal evaluation metrics. Nested cross-validation was used to allow evaluation and comparison and tuning of models simultaneously.ResultsOf 1.3 million patients from seven Canadian provinces, 5,600 people aged 19 + were randomly selected. The optimal algorithm for selecting RA cases was generated by the XGBoost classification method. Based on feature importance scores for features in the XGBoost output, a human-readable case definition was created, where RA cases are identified when there are at least 2 occurrences of text "rheumatoid" in any billing, encounter diagnosis, or health condition table of the patient chart. The final case definition had sensitivity of 81.6% (95% CI, 75.6-86.4), specificity of 98.0% (95% CI, 97.4-98.5), positive predicted value of 76.3% (95% CI, 70.1-81.5), and negative predicted value of 98.6% (95% CI, 98.0-98.6).ConclusionA case definition for RA in using primary care EMR data was developed based off the XGBoost algorithm. With high validity metrics, this case definition is expected to be a reliable tool for future epidemiological research and surveillance investigating the management of RA in CPCSSN dataset.
引用
收藏
页数:9
相关论文
共 50 条
  • [31] Development and Validation of Machine Learning Algorithms for Prediction of Colorectal Polyps Based on Electronic Health Records
    Ba, Qinwen
    Yuan, Xu
    Wang, Yun
    Shen, Na
    Xie, Huaping
    Lu, Yanjun
    BIOMEDICINES, 2024, 12 (09)
  • [32] Machine learning computational model to predict lung cancer using electronic medical records
    Levi, Matanel
    Lazebnik, Teddy
    Kushnir, Shiri
    Yosef, Noga
    Shlomi, Dekel
    CANCER EPIDEMIOLOGY, 2024, 92
  • [33] Improving triaging from primary care into secondary care using heterogeneous data-driven hybrid machine learning
    Wang, Bing
    Li, Weizi
    Bradlow, Anthony
    Bazuaye, Eghosa
    Chan, Antoni T. Y.
    DECISION SUPPORT SYSTEMS, 2023, 166
  • [34] Using Natural Language Processing and Machine Learning to Preoperatively Predict Lymph Node Metastasis for Non-Small Cell Lung Cancer With Electronic Medical Records: Development and Validation Study
    Hu, Danqing
    Li, Shaolei
    Zhang, Huanyao
    Wu, Nan
    Lu, Xudong
    JMIR MEDICAL INFORMATICS, 2022, 10 (04) : 153 - 170
  • [35] Joint modeling strategy for using electronic medical records data to build machine learning models: an example of intracerebral hemorrhage
    Tang, Jianxiang
    Wang, Xiaoyu
    Wan, Hongli
    Lin, Chunying
    Shao, Zilun
    Chang, Yang
    Wang, Hexuan
    Wu, Yi
    Zhang, Tao
    Du, Yu
    BMC MEDICAL INFORMATICS AND DECISION MAKING, 2022, 22 (01)
  • [36] Joint modeling strategy for using electronic medical records data to build machine learning models: an example of intracerebral hemorrhage
    Jianxiang Tang
    Xiaoyu Wang
    Hongli Wan
    Chunying Lin
    Zilun Shao
    Yang Chang
    Hexuan Wang
    Yi Wu
    Tao Zhang
    Yu Du
    BMC Medical Informatics and Decision Making, 22
  • [37] Rheumatoid arthritis, psoriatic arthritis, and axial spondyloarthritis epidemiology in England from 2004 to 2020: An observational study using primary care electronic health record data
    Scott, Ian C.
    Whittle, Rebecca
    Bailey, James
    Twohig, Helen
    Hider, Samantha L.
    Mallen, Christian D.
    Muller, Sara
    Jordan, Kelvin P.
    LANCET REGIONAL HEALTH-EUROPE, 2022, 23
  • [38] Natural Language Processing and Machine Learning for Identifying Incident Stroke From Electronic Health Records: Algorithm Development and Validation
    Zhao, Yiqing
    Fu, Sunyang
    Bielinski, Suzette J.
    Decker, Paul A.
    Chamberlain, Alanna M.
    Roger, Veronique L.
    Liu, Hongfang
    Larson, Nicholas B.
    JOURNAL OF MEDICAL INTERNET RESEARCH, 2021, 23 (03)
  • [39] Electronic primary dental care records in research: A case study of validation and quality assurance strategies
    Wanyonyi, Kristina L.
    Radford, David R.
    Gallagher, Jennifer E.
    INTERNATIONAL JOURNAL OF MEDICAL INFORMATICS, 2019, 127 : 88 - 94
  • [40] Data-driven approach for assessing utility of medical tests using electronic medical records
    Skrovseth, Stein Olav
    Augestad, Knut Magne
    Ebadollahi, Shahram
    JOURNAL OF BIOMEDICAL INFORMATICS, 2015, 53 : 270 - 276