MultiWOZ-PT: A Task-oriented Dialogue Dataset in Portuguese

被引:0
作者
Ferreira, Patricia [1 ]
Pais, Francisco [1 ]
Silva, Catarina [1 ]
Alves, Ana [2 ]
Oliveira, Hugo Goncalo [1 ]
机构
[1] Univ Coimbra, CISUC, LASIDEI, Coimbra, Portugal
[2] Inst Super Engn Coimbra, CISUC, LASI, Coimbra, Portugal
来源
LINGUAMATICA | 2024年 / 16卷 / 02期
关键词
task-oriented dialogue dataset; translation; multiWOZ; dialogue state tracking; intent recognition; slot filling;
D O I
暂无
中图分类号
H0 [语言学];
学科分类号
030303 ; 0501 ; 050102 ;
摘要
Despite the language widespread usage, publicly available and annotated Portuguese dialogue corpora are scarce. This poses a significant challenge in the development of effective dialogue systems that communicate in Portuguese. Having this in mind, we present MultiWOZ-PT, a new task-oriented dialogue dataset that results from the manual translation of dialogues in the MultiWOZ dataset to the European variety of Portuguese, as well as an adaptation of its database. We provide comprehensive guidelines and insights into the process of creating MultiWOZ-PT and, to demonstrate its practical utility, we conducted experiments in two task-oriented scenarios: Intent Recognition and Dialog State Tracking, both useful for dialogue systems. Reported results illustrate the dataset's effectiveness and its potential for training and evaluating language understanding and dialogue management models for Portuguese. Therefore, MultiWOZ-PT constitutes a significant contribution to the computational processing of this language, fostering further research and development.
引用
收藏
页数:16
相关论文
共 33 条
  • [1] Budzianowski P, 2018, 2018 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP 2018), P5016
  • [2] Carvalho Nuno Ramos, 2021, 10 S LANG APPL TECHN, V18, P1, DOI [10.4230/OASIcs.SLATE.2021.18, DOI 10.4230/OASICS.SLATE.2021.18]
  • [3] da Silva Fabio Ricardo Araujo, 2018, Tese de Mestrado, DOI [10.13140/RG.2.2.18896.81924, DOI 10.13140/RG.2.2.18896.81924]
  • [4] Devlin J, 2019, 2019 CONFERENCE OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: HUMAN LANGUAGE TECHNOLOGIES (NAACL HLT 2019), VOL. 1, P4171
  • [5] Ding Bosheng, 2021, arXiv
  • [6] Fonseca E., 2016, COMPUTATIONAL PROCES, P13
  • [7] Freitas C, 2010, LREC 2010 - SEVENTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, P3630
  • [8] Gao SY, 2019, Arxiv, DOI arXiv:1908.01946
  • [9] Gu XD, 2021, AAAI CONF ARTIF INTE, V35, P12911
  • [10] He Pengcheng, 2020, INT C LEARNING REPRE, DOI 10.48550/ARXIV.2006.03654