Prompt text classifications with transformer models! An exemplary introduction to prompt-based learning with large language models

被引:12
作者
Mayer, Christian W. F. [1 ]
Ludwig, Sabrina [1 ]
Brandt, Steffen [2 ]
机构
[1] Univ Mannheim, Area Econ & Business Educ, Mannheim, Germany
[2] Opencampus Sh, Kiel, Germany
关键词
Artificial intelligence in education; machine learning; natural language processing; transformer-based language models; prompt-based learning; classification; AGREEMENT; PRIVACY;
D O I
10.1080/15391523.2022.2142872
中图分类号
G40 [教育学];
学科分类号
040101 ; 120403 ;
摘要
This study investigates the potential of automated classification using prompt-based learning approaches with transformer models (large language models trained in an unsupervised manner) for a domain-specific classification task. Prompt-based learning with zero or few shots has the potential to (1) make use of artificial intelligence without sophisticated programming skills and (2) make use of artificial intelligence without fine-tuning models with large amounts of labeled training data. We apply this novel method to perform an experiment using so-called zero-shot classification as a baseline model and a few-shot approach for classification. For comparison, we also fine-tune a language model on the given classification task and conducted a second independent human rating to compare it with the given human ratings from the original study. The used dataset consists of 2,088 email responses to a domain-specific problem-solving task that were manually labeled for their professional communication style. With the novel prompt-based learning approach, we achieved a Cohen's kappa of .40, while the fine-tuning approach yields a kappa of .59, and the new human rating achieved a kappa of .58 with the original human ratings. However, the classifications from the machine learning models have the advantage that each prediction is provided with a reliability estimate allowing us to identify responses that are difficult to score. We, therefore, argue that response ratings should be based on a reciprocal workflow of machine raters and human raters, where the machine rates easy-to-classify responses and the human raters focus and agree on the responses that are difficult to classify. Further, we believe that this new, more intuitive, prompt-based learning approach will enable more people to use artificial intelligence.
引用
收藏
页码:125 / 141
页数:17
相关论文
共 81 条
  • [1] Persistent Anti-Muslim Bias in Large Language Models
    Abid, Abubakar
    Farooqi, Maheen
    Zou, James
    [J]. AIES '21: PROCEEDINGS OF THE 2021 AAAI/ACM CONFERENCE ON AI, ETHICS, AND SOCIETY, 2021, : 298 - 306
  • [2] [Anonymous], 2019, ROBERTA ROBUSTLY OPT, DOI [DOI 10.48550/arXiv.1907.11692, 10.48550/arXiv.1907.11692]
  • [3] [Anonymous], PAPERS CODE THE LATE
  • [4] [Anonymous], MODELS HUGGING FACE
  • [5] Attali Y., 2006, J TECHNOLOGY LEARNIN, V4, P4
  • [6] Becker A., 2020, RELIABILITY VALIDITY, V10, P151, DOI [10.1007/978-3-030-53081-5_9, DOI 10.1007/978-3-030-53081-5_9]
  • [7] A novel automated essay scoring approach for reliable higher educational assessments
    Beseiso, Majdi
    Alzubi, Omar A.
    Rashaideh, Hasan
    [J]. JOURNAL OF COMPUTING IN HIGHER EDUCATION, 2021, 33 (03) : 727 - 746
  • [8] Bostan L. A. M., 2018, P 27 INT C COMP LING, P2104
  • [9] Brandt S., 2016, ANN M NATL COUNCIL M, P20
  • [10] Brown Tom B., 2020, NEURIPS