Prompt text classifications with transformer models! An exemplary introduction to prompt-based learning with large language models

被引：12

作者：

Mayer, Christian W. F. ^{[1
]}

Ludwig, Sabrina ^{[1
]}

Brandt, Steffen ^{[2
]}

机构：

[1] Univ Mannheim, Area Econ & Business Educ, Mannheim, Germany

[2] Opencampus Sh, Kiel, Germany

来源：

JOURNAL OF RESEARCH ON TECHNOLOGY IN EDUCATION | 2023年 / 55卷 / 01期

关键词：

Artificial intelligence in education; machine learning; natural language processing; transformer-based language models; prompt-based learning; classification; AGREEMENT; PRIVACY;

D O I：

10.1080/15391523.2022.2142872

中图分类号：

G40 [教育学];

学科分类号：

040101 ; 120403 ;

摘要：

This study investigates the potential of automated classification using prompt-based learning approaches with transformer models (large language models trained in an unsupervised manner) for a domain-specific classification task. Prompt-based learning with zero or few shots has the potential to (1) make use of artificial intelligence without sophisticated programming skills and (2) make use of artificial intelligence without fine-tuning models with large amounts of labeled training data. We apply this novel method to perform an experiment using so-called zero-shot classification as a baseline model and a few-shot approach for classification. For comparison, we also fine-tune a language model on the given classification task and conducted a second independent human rating to compare it with the given human ratings from the original study. The used dataset consists of 2,088 email responses to a domain-specific problem-solving task that were manually labeled for their professional communication style. With the novel prompt-based learning approach, we achieved a Cohen's kappa of .40, while the fine-tuning approach yields a kappa of .59, and the new human rating achieved a kappa of .58 with the original human ratings. However, the classifications from the machine learning models have the advantage that each prediction is provided with a reliability estimate allowing us to identify responses that are difficult to score. We, therefore, argue that response ratings should be based on a reciprocal workflow of machine raters and human raters, where the machine rates easy-to-classify responses and the human raters focus and agree on the responses that are difficult to classify. Further, we believe that this new, more intuitive, prompt-based learning approach will enable more people to use artificial intelligence.

引用

页码：125 / 141

页数：17

共 81 条

[21] Ifenthaler D., 2022, HDB OPEN DISTANCE DI, P1, DOI [10.1007/978-981-19-0351-9_59-1, DOI 10.1007/978-981-19-0351-9_59-1]
[22] Exploring the relationship of ethics and privacy in learning analytics and design: implications for the field of educational technology
Ifenthaler, Dirk
Tracey, Monica W.
[J]. ETR&D-EDUCATIONAL TECHNOLOGY RESEARCH AND DEVELOPMENT, 2016, 64 (05): : 877 - 880
[23] Student perceptions of privacy principles for learning analytics
Ifenthaler, Dirk
Schumacher, Clara
[J]. ETR&D-EDUCATIONAL TECHNOLOGY RESEARCH AND DEVELOPMENT, 2016, 64 (05): : 923 - 938
[24] AKOVIA: Automated knowledge visualization and assessment
Ifenthaler D.
[J]. Ifenthaler, D. (dirk@ifenthaler.info), 1600, Kluwer Academic Publishers (19): : 241 - 248
[25] How Can We Know What Language Models Know?
Jiang, Zhengbao
Xu, Frank F.
Araki, Jun
Neubig, Graham
[J]. TRANSACTIONS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, 2020, 8 : 423 - 438
[26] Joshi M., 2017, ARXIV170503551 CS
[27] Kaplan J., 2020, arXiv
[28] Ke ZX, 2019, PROCEEDINGS OF THE TWENTY-EIGHTH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, P6300
[29] Krippendorff K, 2004, HUM COMMUN RES, V30, P411, DOI 10.1111/j.1468-2958.2004.tb00738.x
[30] MEASUREMENT OF OBSERVER AGREEMENT FOR CATEGORICAL DATA
LANDIS, JR
KOCH, GG
[J]. BIOMETRICS, 1977, 33 (01) : 159 - 174

← 1 2 3 4 5 6 7 8 9 →