Large Language Models in the Workplace: A Case Study on Prompt Engineering for Job Type Classification

被引：28

作者：

Clavie, Benjamin ^{[1
]}

Ciceu, Alexandru ^{[2
]}

Naylor, Frederick ^{[1
]}

Soulie, Guillaume ^{[1
]}

Brightwell, Thomas ^{[1
]}

机构：

[1] Bright Network, Edinburgh, Midlothian, Scotland

[2] Silicon Grove, Edinburgh, Midlothian, Scotland

来源：

NATURAL LANGUAGE PROCESSING AND INFORMATION SYSTEMS, NLDB 2023 | 2023年 / 13913卷

关键词：

Large Language Models; Text Classification; Natural Language Processing; Industrial Applications; Prompt Engineering;

D O I：

10.1007/978-3-031-35320-8_1

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

This case study investigates the task of job classification in a real-world setting, where the goal is to determine whether an English-language is appropriate for a graduate or entry-level position. We explore multiple approaches to text classification, including supervised approaches such as traditional models like Support Vector Machines (SVMs) and state-of-the-art deep learning methods such as DeBERTa. We compare them with Large Language Models (LLMs) used in both few-shot and zero-shot classification settings. To accomplish this task, we employ prompt engineering, a technique that involves designing prompts to guide the LLMs towards the desired output. Specifically, we evaluate the performance of two commercially available state-of-the-art GPT-3.5-based language models, text-davinci-003 and gpt-3.5-turbo. We also conduct a detailed analysis of the impact of different aspects of prompt engineering on the model's performance. Our results show that, with a well-designed prompt, a zero-shot gpt-3.5-turboclassifier outperforms all other models, achieving a 6% increase in Precision@95% Recall compared to the best supervised approach. Furthermore, we observe that the wording of the prompt is a critical factor in eliciting the appropriate "reasoning" in the model, and that seemingly minor aspects of the prompt significantly affect the model's performance.

引用

页码：3 / 17

页数：15

共 35 条

[1]

Anders G., 2021, LinkedIn Economic Graph Research

[2]

Bommasani R, 2021, arXiv, DOI [DOI 10.48550/ARXIV.2108.07258, 10.48550/arXiv.2108.07258]

[3] WoLMIS: a labor market intelligence system for classifying web job vacancies [J].

Boselli, Roberto ;

Cesarini, Mirko ;

Marrara, Stefania ;

Mercorio, Fabio ;

Mezzanzanica, Mario ;

Pasi, Gabriella ;

Viviani, Marco .

JOURNAL OF INTELLIGENT INFORMATION SYSTEMS, 2018, 51 (03) :477-502

[4]

Brown TB, 2020, ADV NEUR IN, V33

[5] The Unreasonable Effectiveness of the Baseline: Discussing SVMs in Legal Text Classification [J].

Clavie, Benjamin ;

Alphonsus, Marc .

LEGAL KNOWLEDGE AND INFORMATION SYSTEMS, 2021, 346 :58-61

[6]

Devlin J, 2019, 2019 CONFERENCE OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: HUMAN LANGUAGE TECHNOLOGIES (NAACL HLT 2019), VOL. 1, P4171

[7]

Diakopoulos N., 2023, Medium

[8] Job matching and propagation [J].

Fujita, Shigeru ;

Ramey, Garey .

JOURNAL OF ECONOMIC DYNAMICS & CONTROL, 2007, 31 (11) :3671-3698

[9]

Gao LY, 2023, Arxiv, DOI [arXiv:2211.10435, 10.48550/arXiv.2211.10435]

[10] The changing graduate labour market: analysis using a new indicator of graduate jobs [J].

Green, Francis ;

Henseke, Golo .

IZA JOURNAL OF LABOR POLICY, 2016, 5

← 1 2 3 4 →