Investigating Low-Cost LLM Annotation for Spoken Dialogue Understanding Datasets

被引:0
作者
Druart, Lucas [1 ,2 ]
Vielzeuf, Valentin [2 ]
Esteve, Yannick [1 ]
机构
[1] Lab Informat Avignon, Avignon, France
[2] Orange Innovat, Rennes, France
来源
TEXT, SPEECH, AND DIALOGUE, TSD 2024, PT II | 2024年 / 15049卷
关键词
spoken dialogue systems; automatic annotation; large language models; spoken language understanding;
D O I
10.1007/978-3-031-70566-3_18
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In spoken Task-Oriented Dialogue (TOD) systems, the choice of the semantic representation describing the users' requests is key to a smooth interaction. Indeed, the system uses this representation to reason over a database and its domain knowledge to choose its next action. The dialogue course thus depends on the information provided by this semantic representation. While textual datasets provide fine-grained semantic representations, spoken dialogue datasets fall behind. This paper provides insights into automatic enhancement of spoken dialogue datasets' semantic representations. Our contributions are three fold: (1) assess the relevance of Large Language Model fine-tuning, (2) evaluate the knowledge captured by the produced annotations and (3) highlight semi-automatic annotation implications.
引用
收藏
页码:199 / 209
页数:11
相关论文
共 26 条
[1]  
Banarescu L., 2013, P 7 LINGUISTIC ANNOT
[2]   Benchmarking benchmarks: introducing new automatic indicators for benchmarking Spoken Language Understanding corpora [J].
Bechet, Frederic ;
Raymond, Christian .
INTERSPEECH 2019, 2019, :4145-4149
[3]  
Bonial C, 2020, PROCEEDINGS OF THE 12TH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION (LREC 2020), P684
[4]  
Budzianowski P, 2018, 2018 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP 2018), P5016
[5]  
Cai Shu, 2013, Short Papers, P748
[6]  
Chen X, 2020, PROCEEDINGS OF THE 2020 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP), P5090
[7]  
Devillers L, 2004, INT C LANG RES EV
[8]   Revisiting the Boundary between ASR and NLU in the Age of Conversational Dialog Systems [J].
Faruqui, Manaal ;
Hakkani-Tur, Dilek .
COMPUTATIONAL LINGUISTICS, 2022, 48 (01) :221-232
[9]  
Geng S., 2023, P 2023 C EMP METH NA
[10]   Automatic annotation of context and speech acts for dialogue corpora [J].
Georgila, Kallirroi ;
Lemon, Oliver ;
Henderson, James ;
Moore, Johanna D. .
NATURAL LANGUAGE ENGINEERING, 2009, 15 :315-353