A systematic review of validity evidence for checklists versus global rating scales in simulation-based assessment

被引:210
作者
Ilgen, Jonathan S. [1 ]
Ma, Irene W. Y. [2 ]
Hatala, Rose [3 ]
Cook, David A. [4 ,5 ]
机构
[1] Univ Washington, Sch Med, Dept Med, Div Emergency Med, Seattle, WA 98104 USA
[2] Univ Calgary, Dept Med, Calgary, AB, Canada
[3] Univ British Columbia, Dept Med, Vancouver, BC, Canada
[4] Mayo Clin, Coll Med, Multidisciplinary Simulat Ctr, Rochester, MN USA
[5] Mayo Clin, Div Gen Internal Med, Rochester, MN USA
关键词
OBJECTIVE STRUCTURED ASSESSMENT; TECHNOLOGY-ENHANCED SIMULATION; ACUTE-CARE SKILLS; TECHNICAL SKILL; PSYCHOMETRIC PROPERTIES; PROGRAMMATIC ASSESSMENT; INTERRATER RELIABILITY; PERFORMANCE ASSESSMENT; RESIDENT PERFORMANCE; CONSTRUCT-VALIDATION;
D O I
10.1111/medu.12621
中图分类号
G40 [教育学];
学科分类号
040101 ; 120403 ;
摘要
ContextThe relative advantages and disadvantages of checklists and global rating scales (GRSs) have long been debated. To compare the merits of these scale types, we conducted a systematic review of the validity evidence for checklists and GRSs in the context of simulation-based assessment of health professionals. MethodsWe conducted a systematic review of multiple databases including MEDLINE, EMBASE and Scopus to February 2013. We selected studies that used both a GRS and checklist in the simulation-based assessment of health professionals. Reviewers working in duplicate evaluated five domains of validity evidence, including correlation between scales and reliability. We collected information about raters, instrument characteristics, assessment context, and task. We pooled reliability and correlation coefficients using random-effects meta-analysis. ResultsWe found 45 studies that used a checklist and GRS in simulation-based assessment. All studies included physicians or physicians in training; one study also included nurse anaesthetists. Topics of assessment included open and laparoscopic surgery (n=22), endoscopy (n=8), resuscitation (n=7) and anaesthesiology (n=4). The pooled GRS-checklist correlation was 0.76 (95% confidence interval [CI] 0.69-0.81, n=16 studies). Inter-rater reliability was similar between scales (GRS 0.78, 95% CI 0.71-0.83, n=23; checklist 0.81, 95% CI 0.75-0.85, n=21), whereas GRS inter-item reliabilities (0.92, 95% CI 0.84-0.95, n=6) and inter-station reliabilities (0.80, 95% CI 0.73-0.85, n=10) were higher than those for checklists (0.66, 95% CI 0-0.84, n=4 and 0.69, 95% CI 0.56-0.77, n=10, respectively). Content evidence for GRSs usually referenced previously reported instruments (n=33), whereas content evidence for checklists usually described expert consensus (n=26). Checklists and GRSs usually had similar evidence for relations to other variables. ConclusionsChecklist inter-rater reliability and trainee discrimination were more favourable than suggested in earlier work, but each task requires a separate checklist. Compared with the checklist, the GRS has higher average inter-item and inter-station reliability, can be used across multiple tasks, and may better capture nuanced elements of expertise. Discuss ideas arising from the article at discuss
引用
收藏
页码:161 / 173
页数:13
相关论文
共 84 条
[1]   Comparison of Checklist and Anchored Global Rating Instruments for Performance Rating of Simulated Pediatric Emergencies [J].
Adler, Mark D. ;
Vozenilek, John A. ;
Trainor, Jennifer L. ;
Eppich, Walter J. ;
Wang, Ernest E. ;
Beaumont, Jennifer L. ;
Aitchison, Pamela R. ;
Pribaz, Paul J. ;
Erickson, Timothy ;
Edison, Marcia ;
McGaghie, William C. .
SIMULATION IN HEALTHCARE, 2011, 6 (01) :18-24
[2]   A valid method of laparoscopic simulation training and competence assessment [J].
Adrales, GL ;
Park, AE ;
Chu, UB ;
Witzke, DB ;
Donnelly, MB ;
Hoskins, JD ;
Mastrangelo, MJ ;
Gandsas, A .
JOURNAL OF SURGICAL RESEARCH, 2003, 114 (02) :156-162
[3]  
[Anonymous], 2006, AM J MED, DOI DOI 10.1016/J.AMJMED.2005.10.036
[4]   State of the science in health professional education: effective feedback [J].
Archer, Julian C. .
MEDICAL EDUCATION, 2010, 44 (01) :101-108
[5]   Exporting a technical skills evaluation technology to other sites [J].
Ault, G ;
Reznick, R ;
MacRae, H ;
Leadbetter, W ;
DaRosa, D ;
Joehl, R ;
Peters, J ;
Regehr, G .
AMERICAN JOURNAL OF SURGERY, 2001, 182 (03) :254-256
[6]   Does a surgical simulator improve resident operative performance of laparoscopic tubal ligation? [J].
Banks, Erika H. ;
Chudnoff, Scott ;
Karmin, Ira ;
Wang, Cuiling ;
Pardanani, Setul .
AMERICAN JOURNAL OF OBSTETRICS AND GYNECOLOGY, 2007, 197 (05) :541.e1-541.e5
[7]   The reliability of multiple objective measures of surgery and the role of human performance [J].
Bann, S ;
Davis, LM ;
Moorthy, K ;
Munz, Y ;
Hernandez, J ;
Khan, M ;
Datta, V ;
Darzi, A .
AMERICAN JOURNAL OF SURGERY, 2005, 189 (06) :747-752
[8]  
Berkenstadt H, 2006, ISRAEL MED ASSOC J, V8, P728
[9]  
Boulet JR, 2012, CAN J ANESTH, V59, P182, DOI 10.1007/s12630-011-9637-9
[10]   A systematic review of the reliability of objective structured clinical examination scores [J].
Brannick, Michael T. ;
Erol-Korkmaz, H. Tugba ;
Prewett, Matthew .
MEDICAL EDUCATION, 2011, 45 (12) :1181-1189