Development and validation of a rheumatoid arthritis case definition: a machine learning approach using data from primary care electronic medical records

被引:0
|
作者
Pham, Anh N. Q. [1 ,2 ,3 ,4 ]
Barber, Claire E. H. [2 ,3 ]
Drummond, Neil [2 ,3 ,5 ]
Jasper, Lisa [6 ]
Klein, Doug [5 ]
Lindeman, Cliff [7 ]
Widdifield, Jessica [8 ,9 ]
Williamson, Tyler [2 ,3 ]
Jones, C. Allyson [6 ]
机构
[1] Simon Fraser Univ, Dept Hlth Sci, Burnaby, BC, Canada
[2] Univ Calgary, Dept Med, Calgary, AB, Canada
[3] Univ Calgary, Dept Community Hlth Sci, Calgary, AB, Canada
[4] Simon Fraser Univ, Pacific Inst Pathogen Pandem & Soc, Burnaby, BC, Canada
[5] Univ Alberta, Dept Family Med, Edmonton, AB, Canada
[6] Univ Alberta, Fac Rehabil Med, Edmonton, AB, Canada
[7] Coll Phys & Surg Alberta, Edmonton, AB, Canada
[8] Sunnybrook Res Inst, Holland Bone & Joint Res Program, Toronto, ON, Canada
[9] Univ Toronto, Inst Hlth Policy Management & Evaluat, ICES, Toronto, ON, Canada
关键词
Rheumatoid arthritis; Case definition; EMR phenotyping; Electronic medical records; Machine learning; SURVEILLANCE;
D O I
10.1186/s12911-024-02776-w
中图分类号
R-058 [];
学科分类号
摘要
BackgroundRheumatoid Arthritis (RA) is a chronic inflammatory disease that is primarily diagnosed and managed by rheumatologists; however, it is often primary care providers who first encounter RA-related symptoms. This study developed and validated a case definition for RA using national surveillance data in primary care settings.MethodsThis cross-sectional validation study used structured electronic medical record (EMR) data from the Canadian Primary Care Sentinel Surveillance Network (CPCSSN). Based on the reference set generated by EMR reviews by five experts, three machine learning steps: 'bag-of-words' approach to feature generation, feature reduction using a feature importance measure coupled with recursive feature elimination and clustering, and classification using tree-based methods (Decision Tree, Random Forest, and Extreme Gradient Boosting). The three tree-based algorithms were compared to identify the procedure that generated the optimal evaluation metrics. Nested cross-validation was used to allow evaluation and comparison and tuning of models simultaneously.ResultsOf 1.3 million patients from seven Canadian provinces, 5,600 people aged 19 + were randomly selected. The optimal algorithm for selecting RA cases was generated by the XGBoost classification method. Based on feature importance scores for features in the XGBoost output, a human-readable case definition was created, where RA cases are identified when there are at least 2 occurrences of text "rheumatoid" in any billing, encounter diagnosis, or health condition table of the patient chart. The final case definition had sensitivity of 81.6% (95% CI, 75.6-86.4), specificity of 98.0% (95% CI, 97.4-98.5), positive predicted value of 76.3% (95% CI, 70.1-81.5), and negative predicted value of 98.6% (95% CI, 98.0-98.6).ConclusionA case definition for RA in using primary care EMR data was developed based off the XGBoost algorithm. With high validity metrics, this case definition is expected to be a reliable tool for future epidemiological research and surveillance investigating the management of RA in CPCSSN dataset.
引用
收藏
页数:9
相关论文
共 50 条
  • [21] Machine Learning for Automatic Encoding of French Electronic Medical Records: Is More Data Better ?
    Gobeill, Julien
    Ruch, Patrick
    Meyer, Rodolphe
    DIGITAL PERSONALIZED HEALTH AND MEDICINE, 2020, 270 : 312 - 316
  • [22] Validation of fragility fractures in primary care electronic medical records: A population-based study
    Martinez-Laguna, Daniel
    Soria-Castro, Alberto
    Carbonell-Abella, Cristina
    Orozco-Lopez, Pilar
    Estrada-Laza, Pilar
    Nogues, Xavier
    Diez-Perez, Adolfo
    Prieto-Alhambra, Daniel
    REUMATOLOGIA CLINICA, 2019, 15 (05): : E1 - E4
  • [23] From patient care to research: a validation study examining the factors contributing to data quality in a primary care electronic medical record database
    Coleman, Nathan
    Halas, Gayle
    Peeler, William
    Casaclang, Natalie
    Williamson, Tyler
    Katz, Alan
    BMC FAMILY PRACTICE, 2015, 16
  • [24] Approach to machine learning for extraction of real-world data variables from electronic health records
    Adamson, Blythe
    Waskom, Michael
    Blarre, Auriane
    Kelly, Jonathan
    Krismer, Konstantin
    Nemeth, Sheila
    Gippetti, James
    Ritten, John
    Harrison, Katherine
    Ho, George
    Linzmayer, Robin
    Bansal, Tarun
    Wilkinson, Samuel
    Amster, Guy
    Estola, Evan
    Benedum, Corey M.
    Fidyk, Erin
    Estevez, Melissa
    Shapiro, Will
    Cohen, Aaron B.
    FRONTIERS IN PHARMACOLOGY, 2023, 14
  • [25] Optimising the use of electronic health records to estimate the incidence of rheumatoid arthritis in primary care: what information is hidden in free text?
    Ford, Elizabeth
    Nicholson, Amanda
    Koeling, Rob
    Tate, A. Rosemary
    Carroll, John
    Axelrod, Lesley
    Smith, Helen E.
    Rait, Greta
    Davies, Kevin A.
    Petersen, Irene
    Williams, Tim
    Cassell, Jackie A.
    BMC MEDICAL RESEARCH METHODOLOGY, 2013, 13
  • [26] Optimising the use of electronic health records to estimate the incidence of rheumatoid arthritis in primary care: what information is hidden in free text?
    Elizabeth Ford
    Amanda Nicholson
    Rob Koeling
    A Rosemary Tate
    John Carroll
    Lesley Axelrod
    Helen E Smith
    Greta Rait
    Kevin A Davies
    Irene Petersen
    Tim Williams
    Jackie A Cassell
    BMC Medical Research Methodology, 13
  • [27] EXTRACTION OF MEDICAL DATA FROM ELECTRONIC MEDICAL RECORDS USING NLP ALGORITHMS
    Gusev, Aleksandr V.
    Novitskiy, Roman E.
    Ivshin, Aleksandr A.
    Boldina, Juliia S.
    Shtykov, Aleksey S.
    Vasilev, Aleksey S.
    AD ALTA-JOURNAL OF INTERDISCIPLINARY RESEARCH, 2022, 12 (02): : 314 - 319
  • [28] Views on health information sharing and privacy from primary care practices using electronic medical records
    Perera, Gihan
    Holbrook, Anne
    Thabane, Lehana
    Foster, Gary
    Willison, Donald J.
    INTERNATIONAL JOURNAL OF MEDICAL INFORMATICS, 2011, 80 (02) : 94 - 101
  • [29] On an Approach of the Solution of Machine Learning Problems Integrated with Data from the Open-Source System of Electronic Medical Records: Application for Fractures Prediction
    Martsenyuk, Vasyl
    Povoroznyuk, Vladyslav
    Semenets, Andriy
    Martynyuk, Larysa
    ARTIFICIAL INTELLIGENCE AND SOFT COMPUTING, ICAISC 2019, PT II, 2019, 11509 : 228 - 239
  • [30] Using machine learning to detect sarcopenia from electronic health records
    Luo, Xiao
    Ding, Haoran
    Broyles, Andrea
    Warden, Stuart J.
    Moorthi, Ranjani N.
    Imel, Erik A.
    DIGITAL HEALTH, 2023, 9