Creation, Analysis and Evaluation of AnnoMI, a Dataset of Expert-Annotated Counselling Dialogues

被引:9
|
作者
Wu, Zixiu [1 ,2 ]
Balloccu, Simone [3 ]
Kumar, Vivek [2 ]
Helaoui, Rim [1 ]
Recupero, Diego Reforgiato [2 ]
Riboni, Daniele [2 ]
机构
[1] Philips Res, High Tech Campus, NL-5656 AE Eindhoven, Netherlands
[2] Univ Cagliari, Dept Math & Comp Sci, I-09124 Cagliari, Italy
[3] Univ Aberdeen, Dept Comp Sci, Aberdeen AB24 3FX, Scotland
基金
欧盟地平线“2020”;
关键词
dialogue; counselling; motivational interviewing; natural language processing; dataset; LANGUAGE; RELIABILITY;
D O I
10.3390/fi15030110
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Research on the analysis of counselling conversations through natural language processing methods has seen remarkable growth in recent years. However, the potential of this field is still greatly limited by the lack of access to publicly available therapy dialogues, especially those with expert annotations, but it has been alleviated thanks to the recent release of AnnoMI, the first publicly and freely available conversation dataset of 133 faithfully transcribed and expert-annotated demonstrations of high- and low-quality motivational interviewing (MI)-an effective therapy strategy that evokes client motivation for positive change. In this work, we introduce new expert-annotated utterance attributes to AnnoMI and describe the entire data collection process in more detail, including dialogue source selection, transcription, annotation, and post-processing. Based on the expert annotations on key MI aspects, we carry out thorough analyses of AnnoMI with respect to counselling-related properties on the utterance, conversation, and corpus levels. Furthermore, we introduce utterance-level prediction tasks with potential real-world impacts and build baseline models. Finally, we examine the performance of the models on dialogues of different topics and probe the generalisability of the models to unseen topics.
引用
收藏
页数:26
相关论文
共 15 条
  • [1] ANNO-MI: A DATASET OF EXPERT-ANNOTATED COUNSELLING DIALOGUES
    Wu, Zixiu
    Balloccu, Simone
    Kumar, Vivek
    Helaoui, Rim
    Reiter, Ehud
    Recupero, Diego Reforgiato
    Riboni, Daniele
    2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 6177 - 6181
  • [2] Expert-Annotated Dataset to Study Cyberbullying in Polish Language
    Ptaszynski, Michal
    Pieciukiewicz, Agata
    Dybala, Pawel
    Skrzek, Pawel
    Soliwoda, Kamil
    Fortuna, Marcin
    Leliwa, Gniewosz
    Wroczynski, Michal
    DATA, 2024, 9 (01)
  • [3] VisImages: A Fine-Grained Expert-Annotated Visualization Dataset
    Deng, Dazhen
    Wu, Yihong
    Shu, Xinhuan
    Wu, Jiang
    Fu, Siwei
    Cui, Weiwei
    Wu, Yingcai
    IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS, 2023, 29 (07) : 3298 - 3311
  • [4] Biasly: An Expert-Annotated Dataset for Subtle Misogyny Detection and Mitigation
    Sheppare, Brooklyn
    Richter, Anna
    Cohen, Allison
    Smith, Elizabeth Allyn
    Kneese, Tamara
    Pelletier, Carolyne
    Baldini, Ioana
    Dong, Yue
    FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: ACL 2024, 2024, : 427 - 452
  • [5] ArQuAD: An Expert-Annotated Arabic Machine Reading Comprehension Dataset
    Obeidat, Rasha
    Al-Harbi, Marwa
    Al-Ayyoub, Mahmoud
    Alawneh, Luay
    COGNITIVE COMPUTATION, 2024, 16 (03) : 984 - 1003
  • [6] MAUD: An Expert-Annotated Legal NLP Dataset for Merger Agreement Understanding
    Wang, Steven H.
    Scardigli, Antoine
    Tang, Leonard
    Chen, Wei
    Levkin, Dimitry
    Chen, Anya
    Ball, Spencer
    Woodside, Thomas
    Zhang, Oliver
    Hendrycks, Dan
    2023 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP 2023), 2023, : 16369 - 16382
  • [7] STANDER: An Expert-Annotated Dataset for News Stance Detection and Evidence Retrieval
    Conforti, Costanza
    Berndt, Jakob
    Pilehvar, Mohammad Taher
    Giannitsarou, Chryssi
    Toxvaerd, Flavio
    Collier, Nigel
    FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, EMNLP 2020, 2020, : 4086 - 4101
  • [8] Prostate158-An expert-annotated 3T MRI dataset and algorithm for prostate cancer detection
    Adams, Lisa C.
    Makowski, Marcus R.
    Engel, Guenther
    Rattunde, Maximilian
    Busch, Felix
    Asbach, Patrick
    Niehues, Stefan M.
    Vinayahalingam, Shankeeth
    van Ginneken, Bram
    Litjens, Geert
    Bressem, Keno K.
    COMPUTERS IN BIOLOGY AND MEDICINE, 2022, 148
  • [9] RadGraph-XL: A Large-Scale Expert-Annotated Dataset for Entity and Relation Extraction from Radiology Reports
    Delbrouck, Jean-Benoit
    Chambon, Pierre
    Chen, Zhihong
    Varma, Maya
    Johnston, Andrew
    Blankemeier, Louis
    Van Veen, Dave
    Bui, Tan
    Steven Truong
    Langlotz, Curtis P.
    FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: ACL 2024, 2024, : 12902 - 12915
  • [10] Changes in European Solidarity Before and During COVID-19: Evidence from a Large Crowd- and Expert-Annotated Twitter Dataset
    Ils, Alexandra
    Liu, Dan
    Grunow, Daniela
    Eger, Steffen
    59TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS AND THE 11TH INTERNATIONAL JOINT CONFERENCE ON NATURAL LANGUAGE PROCESSING, VOL 1 (ACL-IJCNLP 2021), 2021, : 1623 - 1637