Creation, Analysis and Evaluation of AnnoMI, a Dataset of Expert-Annotated Counselling Dialogues

被引：9

作者：

Wu, Zixiu ^{[1
,2
]}

Balloccu, Simone ^{[3
]}

Kumar, Vivek ^{[2
]}

Helaoui, Rim ^{[1
]}

Recupero, Diego Reforgiato ^{[2
]}

Riboni, Daniele ^{[2
]}

机构：

[1] Philips Res, High Tech Campus, NL-5656 AE Eindhoven, Netherlands

[2] Univ Cagliari, Dept Math & Comp Sci, I-09124 Cagliari, Italy

[3] Univ Aberdeen, Dept Comp Sci, Aberdeen AB24 3FX, Scotland

来源：

FUTURE INTERNET | 2023年 / 15卷 / 03期

基金：

欧盟地平线“2020”;

关键词：

dialogue; counselling; motivational interviewing; natural language processing; dataset; LANGUAGE; RELIABILITY;

D O I：

10.3390/fi15030110

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Research on the analysis of counselling conversations through natural language processing methods has seen remarkable growth in recent years. However, the potential of this field is still greatly limited by the lack of access to publicly available therapy dialogues, especially those with expert annotations, but it has been alleviated thanks to the recent release of AnnoMI, the first publicly and freely available conversation dataset of 133 faithfully transcribed and expert-annotated demonstrations of high- and low-quality motivational interviewing (MI)-an effective therapy strategy that evokes client motivation for positive change. In this work, we introduce new expert-annotated utterance attributes to AnnoMI and describe the entire data collection process in more detail, including dialogue source selection, transcription, annotation, and post-processing. Based on the expert annotations on key MI aspects, we carry out thorough analyses of AnnoMI with respect to counselling-related properties on the utterance, conversation, and corpus levels. Furthermore, we introduce utterance-level prediction tasks with potential real-world impacts and build baseline models. Finally, we examine the performance of the models on dialogues of different topics and probe the generalisability of the models to unseen topics.

引用

页数：26

共 15 条

[1] ANNO-MI: A DATASET OF EXPERT-ANNOTATED COUNSELLING DIALOGUES
Wu, Zixiu
Balloccu, Simone
Kumar, Vivek
Helaoui, Rim
Reiter, Ehud
Recupero, Diego Reforgiato
Riboni, Daniele
2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 6177 - 6181
[2] Expert-Annotated Dataset to Study Cyberbullying in Polish Language
Ptaszynski, Michal
Pieciukiewicz, Agata
Dybala, Pawel
Skrzek, Pawel
Soliwoda, Kamil
Fortuna, Marcin
Leliwa, Gniewosz
Wroczynski, Michal
DATA, 2024, 9 (01)
[3] VisImages: A Fine-Grained Expert-Annotated Visualization Dataset
Deng, Dazhen
Wu, Yihong
Shu, Xinhuan
Wu, Jiang
Fu, Siwei
Cui, Weiwei
Wu, Yingcai
IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS, 2023, 29 (07) : 3298 - 3311
[4] Biasly: An Expert-Annotated Dataset for Subtle Misogyny Detection and Mitigation
Sheppare, Brooklyn
Richter, Anna
Cohen, Allison
Smith, Elizabeth Allyn
Kneese, Tamara
Pelletier, Carolyne
Baldini, Ioana
Dong, Yue
FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: ACL 2024, 2024, : 427 - 452
[5] ArQuAD: An Expert-Annotated Arabic Machine Reading Comprehension Dataset
Obeidat, Rasha
Al-Harbi, Marwa
Al-Ayyoub, Mahmoud
Alawneh, Luay
COGNITIVE COMPUTATION, 2024, 16 (03) : 984 - 1003
[6] MAUD: An Expert-Annotated Legal NLP Dataset for Merger Agreement Understanding
Wang, Steven H.
Scardigli, Antoine
Tang, Leonard
Chen, Wei
Levkin, Dimitry
Chen, Anya
Ball, Spencer
Woodside, Thomas
Zhang, Oliver
Hendrycks, Dan
2023 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP 2023), 2023, : 16369 - 16382
[7] STANDER: An Expert-Annotated Dataset for News Stance Detection and Evidence Retrieval
Conforti, Costanza
Berndt, Jakob
Pilehvar, Mohammad Taher
Giannitsarou, Chryssi
Toxvaerd, Flavio
Collier, Nigel
FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, EMNLP 2020, 2020, : 4086 - 4101
[8] Prostate158-An expert-annotated 3T MRI dataset and algorithm for prostate cancer detection
Adams, Lisa C.
Makowski, Marcus R.
Engel, Guenther
Rattunde, Maximilian
Busch, Felix
Asbach, Patrick
Niehues, Stefan M.
Vinayahalingam, Shankeeth
van Ginneken, Bram
Litjens, Geert
Bressem, Keno K.
COMPUTERS IN BIOLOGY AND MEDICINE, 2022, 148
[9] RadGraph-XL: A Large-Scale Expert-Annotated Dataset for Entity and Relation Extraction from Radiology Reports
Delbrouck, Jean-Benoit
Chambon, Pierre
Chen, Zhihong
Varma, Maya
Johnston, Andrew
Blankemeier, Louis
Van Veen, Dave
Bui, Tan
Steven Truong
Langlotz, Curtis P.
FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: ACL 2024, 2024, : 12902 - 12915
[10] Changes in European Solidarity Before and During COVID-19: Evidence from a Large Crowd- and Expert-Annotated Twitter Dataset
Ils, Alexandra
Liu, Dan
Grunow, Daniela
Eger, Steffen
59TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS AND THE 11TH INTERNATIONAL JOINT CONFERENCE ON NATURAL LANGUAGE PROCESSING, VOL 1 (ACL-IJCNLP 2021), 2021, : 1623 - 1637

← 1 2 →