共 48 条
[41]
Ramineni C., Williamson D., Understanding mean score differences between the e-rater® automated scoring engine and humans for demographically based groups in the GRE® general test, ETS Research Report Series, 2018, pp. 1-31, (2018)
[42]
Shermis M.D., Burstein J., Bursky S.A., Introduction to automated essay evaluation, Handbook of automated essay evaluation: Current applications and new directions, pp. 1-15, (2013)
[43]
Shrout P.E., Fleiss J.L., Intraclass correlations: Uses in assessing rater reliability, Psychological Bulletin, 86, 2, pp. 420-428, (1979)
[44]
Singleton-Jackson J.A., Lumsden D.B., Newsom R., Johnny still can't write, even if he goes to college: A study of writing proficiency in higher education graduate students, Current Issues in Education, 12, 10, (2009)
[45]
Warschauer M., Ware P., Automated writing evaluation: Defining the classroom research agenda, Language Teaching Research, 10, 2, pp. 157-180, (2006)
[46]
Weigle S.C., Using FACETS to model rater training effects, Language Testing, 15, 2, pp. 263-287, (1998)
[47]
Weigle S.C., English as a second language writing and automated essay evaluation, Handbook of automated essay evaluation: Current applications and new directions, pp. 36-54, (2013)
[48]
Zhou Y., Muresanu A.I., Han Z., Paster K., Pitis S., Chan H., Ba J., Large language models are human-level prompt engineers, International conference on learning representations 2023, (2023)