Using multivariate generalizability theory to assess the effect of content stratification on the reliability of a performance assessment

被引:8
作者
Keller, Lisa A. [1 ]
Clauser, Brian E. [1 ]
Swanson, David B. [1 ]
机构
[1] Univ Massachusetts, Amherst, MA 01003 USA
关键词
Generalizability theory; Performance assessment; Reliability; CLINICAL SKILLS; SPECIFICATIONS; ALPHA; MODEL;
D O I
10.1007/s10459-010-9233-8
中图分类号
G40 [教育学];
学科分类号
040101 ; 120403 ;
摘要
In recent years, demand for performance assessments has continued to grow. However, performance assessments are notorious for lower reliability, and in particular, low reliability resulting from task specificity. Since reliability analyses typically treat the performance tasks as randomly sampled from an infinite universe of tasks, these estimates of reliability may not be accurate. For tests built according to a table of specifications, tasks are randomly sampled from different strata (content domains, skill areas, etc.). If these strata remain fixed in the test construction process, ignoring this stratification in the reliability analysis results in an underestimate of "parallel forms" reliability, and an overestimate of the person-by-task component. This research explores the effect of representing and misrepresenting the stratification appropriately in estimation of reliability and the standard error of measurement. Both multivariate and univariate generalizability studies are reported. Results indicate that the proper specification of the analytic design is essential in yielding the proper information both about the generalizability of the assessment and the standard error of measurement. Further, illustrative D studies present the effect under a variety of situations and test designs. Additional benefits of multivariate generalizability theory in test design and evaluation are also discussed.
引用
收藏
页码:717 / 733
页数:17
相关论文
共 25 条
  • [1] [Anonymous], 1972, The dependability of behavioural measurements: Theory of generalizability for scores and profiles
  • [2] Brennan R.L., 2001, GENERALIZABILITY THE
  • [3] Brennan R. L., 1999, IOWA TESTING PROGRAM
  • [4] BRENNAN RL, 1983, GENOVA COMPUTER SOFT
  • [5] BRENNAN RL, 1999, IOWA TESTING PROGRAM, V46
  • [6] A comparison of the generalizability of scores produced by expert raters and automated scoring systems
    Clauser, BE
    Swanson, DB
    Clyman, SG
    [J]. APPLIED MEASUREMENT IN EDUCATION, 1999, 12 (03) : 281 - 299
  • [7] CLAUSER BE, 2008, ACAD MED S
  • [8] A multivariate generalizability analysis of data from a performance assessment of physicians' clinical skills
    Clauser, Brian E.
    Harik, Polina
    Margolis, Melissa J.
    [J]. JOURNAL OF EDUCATIONAL MEASUREMENT, 2006, 43 (03) : 173 - 191
  • [9] Clyman S., 1999, INNOVATIVE SIMULATIO, P29
  • [10] Coffman W., 1971, ED MEASUREMENT, V2nd, P271