Defining equivalence in medical education evaluation and research: does a distribution-based approach work?

被引:8
作者
Rusticus, Shayna A. [1 ,2 ]
Eva, Kevin W. [3 ]
机构
[1] Univ British Columbia, Evaluat Studies Unit, 2775 Laurel St,11th Floor, Vancouver, BC V5Z 1M9, Canada
[2] Univ British Columbia, Ctr Hlth Educ Scholarship, 2775 Laurel St,11th Floor, Vancouver, BC V5Z 1M9, Canada
[3] Univ British Columbia, Ctr Hlth Educ Scholarship, 950 W 10th Ave, Vancouver, BC V5Z 1L9, Canada
关键词
Distribution-based methods; Effect size; Equivalence tests; Medical education; Program evaluation; TESTS;
D O I
10.1007/s10459-015-9633-x
中图分类号
G40 [教育学];
学科分类号
040101 ; 120403 ;
摘要
Educators often seek to demonstrate the equivalence of groups, such as whether or not students achieve comparable success regardless of the site at which they trained. A methodological consideration that is often underappreciated is how to operationalize equivalence. This study examined whether a distribution-based approach, based on effect size, can identify an appropriate equivalence threshold for medical education data. Thirty-nine individuals rated program site equivalence on a series of simulated pairwise bar graphs representing one of four measures with which they had prior experience: (1) undergraduate academic achievement, (2) a student experience survey, (3) an Objective Structured Clinical Exam global rating scale, or (4) a licensing exam. Descriptive statistics and repeated measures ANOVA examined the effects on equivalence ratings of (a) the difference between means, (b) variability in scores, and (c) which program site (the larger or smaller) scored higher. The equivalence threshold was defined as the point at which 50 % of participants rated the sites as non-equivalent. Across the four measures, the equivalence thresholds converged to average effect size of Cohen's d = 0.57 (range of 0.50-0.63). This corresponded to an average mean difference of 10 % (range of 3-13 %). These results are discussed in reference to findings from the health-related quality of life field that has demonstrated that d = 0.50 represents a consistent threshold for perceived change. This study provides preliminary empirically-based guidance for defining an equivalence threshold for researchers and evaluators conducting equivalence tests.
引用
收藏
页码:359 / 373
页数:15
相关论文
共 17 条
  • [1] [Anonymous], 2020, FUNCT STRUCT MED SCH
  • [2] [Anonymous], 2011, PRACTICAL ASSESSMENT
  • [3] Assessing equivalence: An alternative to the use of difference tests for measuring disparities in vaccination coverage
    Barker, LE
    Luman, ET
    McCauley, MM
    Chu, SY
    [J]. AMERICAN JOURNAL OF EPIDEMIOLOGY, 2002, 156 (11) : 1056 - 1061
  • [4] A POWER PRIMER
    COHEN, J
    [J]. PSYCHOLOGICAL BULLETIN, 1992, 112 (01) : 155 - 159
  • [5] Understanding the minimum clinically important difference: a review of concepts and methods
    Copay, Anne G.
    Subach, Brian R.
    Glassman, Steven D.
    Polly, David W., Jr.
    Schuler, Thomas C.
    [J]. SPINE JOURNAL, 2007, 7 (05) : 541 - 546
  • [6] Recommendations for applying tests of equivalence
    Cribbie, RA
    Gruman, JA
    Arpin-Cribbie, CA
    [J]. JOURNAL OF CLINICAL PSYCHOLOGY, 2004, 60 (01) : 1 - 10
  • [7] Defining clinically meaningful change in health-related quality of life
    Crosby, RD
    Kolotkin, RL
    Williams, GR
    [J]. JOURNAL OF CLINICAL EPIDEMIOLOGY, 2003, 56 (05) : 395 - 407
  • [8] Internet versus paper-and-pencil survey methods in psychological experiments: Equivalence testing of participant responses to health-related messages
    Lewis, Ioni
    Watson, Barry
    White, Katherine Marie
    [J]. AUSTRALIAN JOURNAL OF PSYCHOLOGY, 2009, 61 (02) : 107 - 116
  • [9] Medical Council of Canada, 2015, SCOR
  • [10] MILLER GA, 1956, PSYCHOL REV, V63, P81, DOI 10.1037/0033-295X.101.2.343