Competency-Based Assessments: Leveraging Artificial Intelligence to Predict Subcompetency Content

被引:11
作者
Booth, Gregory J. [1 ,2 ]
Ross, Benjamin [1 ]
Cronin, William A. [2 ,3 ]
McElrath, Angela [2 ,4 ]
Cyr, Kyle L. [2 ,3 ]
Hodgson, John A. [2 ,3 ]
Sibley, Charles [2 ,4 ]
Ismawan, J. Martin [2 ,5 ]
Zuehl, Alyssa [2 ,4 ]
Slotto, James G. [2 ,5 ]
Higgs, Maureen [1 ,2 ]
Haldeman, Matthew [1 ]
Geiger, Phillip [1 ,2 ]
Jardine, Dink [6 ]
机构
[1] Naval Med Ctr Portsmouth, Dept Anesthesiol & Pain Med, 620 John Paul Jones Circle, Portsmouth, VA 23708 USA
[2] Uniformed Serv Univ Hlth Sci, Bethesda, MD USA
[3] Walter Reed Natl Mil Med Ctr, Natl Capitol Consortium, Dept Anesthesiol, Bethesda, MD USA
[4] San Antonio Uniformed Serv Hlth Educ Consortium, Dept Anesthesiol, San Antonio, TX USA
[5] Naval Med Ctr San Diego, Dept Anesthesiol, San Diego, CA USA
[6] Naval Med Ctr Camp Lejeune, Profess Educ, Camp Lejeune, NC USA
关键词
FEEDBACK;
D O I
10.1097/ACM.0000000000005115
中图分类号
G40 [教育学];
学科分类号
040101 ; 120403 ;
摘要
PurposeFaculty feedback on trainees is critical to guiding trainee progress in a competency-based medical education framework. The authors aimed to develop and evaluate a Natural Language Processing (NLP) algorithm that automatically categorizes narrative feedback into corresponding Accreditation Council for Graduate Medical Education Milestone 2.0 subcompetencies.MethodTen academic anesthesiologists analyzed 5,935 narrative evaluations on anesthesiology trainees at 4 graduate medical education (GME) programs between July 1, 2019, and June 30, 2021. Each sentence (n = 25,714) was labeled with the Milestone 2.0 subcompetency that best captured its content or was labeled as demographic or not useful. Inter-rater agreement was assessed by Fleiss' Kappa. The authors trained an NLP model to predict feedback subcompetencies using data from 3 sites and evaluated its performance at a fourth site. Performance metrics included area under the receiver operating characteristic curve (AUC), positive predictive value, sensitivity, F1, and calibration curves. The model was implemented at 1 site in a self-assessment exercise.ResultsFleiss' Kappa for subcompetency agreement was moderate (0.44). Model performance was good for professionalism, interpersonal and communication skills, and practice-based learning and improvement (AUC 0.79, 0.79, and 0.75, respectively). Subcompetencies within medical knowledge and patient care ranged from fair to excellent (AUC 0.66-0.84 and 0.63-0.88, respectively). Performance for systems-based practice was poor (AUC 0.59). Performances for demographic and not useful categories were excellent (AUC 0.87 for both). In approximately 1 minute, the model interpreted several hundred evaluations and produced individual trainee reports with organized feedback to guide a self-assessment exercise. The model was built into a web-based application.ConclusionsThe authors developed an NLP model that recognized the feedback language of anesthesiologists across multiple GME programs. The model was operationalized in a self-assessment exercise. It is a powerful tool which rapidly organizes large amounts of narrative feedback.
引用
收藏
页码:497 / 504
页数:8
相关论文
共 25 条
[1]  
Accreditation Council for Graduate Medical Education, COMM PROGR REQ RES
[2]   Gender bias in resident evaluations: Natural language processing and competency evaluation [J].
Andrews, Jane ;
Chartash, David ;
Hay, Seonaid .
MEDICAL EDUCATION, 2021, 55 (12) :1383-1387
[3]  
Bird Steven, 2009, Natural language processing with Python:analyzing text with the natural language toolkit
[4]   A Review of Natural Language Processing in Medical Education [J].
Chary, Michael ;
Parikh, Saumil ;
Manini, Alex F. ;
Boyer, Edward W. ;
Radeos, Michael .
WESTERN JOURNAL OF EMERGENCY MEDICINE, 2019, 20 (01) :78-86
[5]  
Collins GS, 2015, ANN INTERN MED, V162, P55, DOI [10.7326/M14-0697, 10.1111/eci.12376, 10.1186/s12916-014-0241-z, 10.1136/bmj.g7594, 10.1016/j.jclinepi.2014.11.010, 10.7326/M14-0698, 10.1016/j.eururo.2014.11.025, 10.1002/bjs.9736, 10.1038/bjc.2014.639]
[6]  
Edgar L., 2020, The milestones guidebook
[7]   Toward a definition of competency-based education in medicine: a systematic review of published definitions [J].
Frank, Jason R. ;
Mungroo, Rani ;
Ahmad, Yasmine ;
Wang, Mimi ;
De Rossi, Stefanie ;
Horsley, Tanya .
MEDICAL TEACHER, 2010, 32 (08) :631-637
[8]   Narrative descriptions should replace grades and numerical ratings for clinical performance in medical education in the United States [J].
Hanson, Janice L. ;
Rosenberg, Adam A. ;
Lane, J. Lindsey .
FRONTIERS IN PSYCHOLOGY, 2013, 4
[9]   Assessment of Gender-Based Linguistic Differences in Physician Trainee Evaluations of Medical Faculty Using Automated Text Mining [J].
Heath, Janae K. ;
Weissman, Gary E. ;
Clancy, Caitlin B. ;
Shou, Haochang ;
Farrar, John T. ;
Dine, C. Jessica .
JAMA NETWORK OPEN, 2019, 2 (05)
[10]   Advances in natural language processing [J].
Hirschberg, Julia ;
Manning, Christopher D. .
SCIENCE, 2015, 349 (6245) :261-266