Biasly: An Expert-Annotated Dataset for Subtle Misogyny Detection and Mitigation

被引:0
|
作者
Sheppare, Brooklyn [1 ]
Richter, Anna [1 ]
Cohen, Allison [1 ]
Smith, Elizabeth Allyn [2 ]
Kneese, Tamara [3 ]
Pelletier, Carolyne [4 ,5 ]
Baldini, Ioana [6 ]
Dong, Yue [7 ]
机构
[1] Mila Quebec AI Inst, Montreal, PQ, Canada
[2] Univ Quebec Montreal, Montreal, PQ, Canada
[3] Data & Soc Res Inst, New York, NY USA
[4] Reliant AI, Berlin, Germany
[5] Mantium, Columbus, OH USA
[6] IBM Res, Armonk, NY USA
[7] Univ Calif Riverside, Riverside, CA USA
来源
FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: ACL 2024 | 2024年
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Using novel approaches to dataset development, the Biasly dataset captures the nuance and subtlety of misogyny in ways that are unique within the literature. Built in collaboration with multi-disciplinary experts and annotators themselves, the dataset contains annotations of movie subtitles, capturing colloquial expressions of misogyny in North American film. The open-source dataset can be used for a range of NLP tasks, including binary and multi-label classification, severity score regression, and text generation for rewrites. In this paper, we discuss the methodology used, analyze the annotations obtained, provide baselines for each task using common NLP algorithms, and furnish error analyses to give insight into model behaviour when fine-tuned on the Biasly dataset.
引用
收藏
页码:427 / 452
页数:26
相关论文
共 50 条
  • [1] An Expert Annotated Dataset for the Detection of Online Misogyny
    Guest, Ella
    Vidgen, Bertie
    Mittos, Alexandros
    Sastry, Nishanth
    Tyson, Gareth
    Margetts, Helen
    16TH CONFERENCE OF THE EUROPEAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (EACL 2021), 2021, : 1336 - 1350
  • [2] STANDER: An Expert-Annotated Dataset for News Stance Detection and Evidence Retrieval
    Conforti, Costanza
    Berndt, Jakob
    Pilehvar, Mohammad Taher
    Giannitsarou, Chryssi
    Toxvaerd, Flavio
    Collier, Nigel
    FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, EMNLP 2020, 2020, : 4086 - 4101
  • [3] Expert-Annotated Dataset to Study Cyberbullying in Polish Language
    Ptaszynski, Michal
    Pieciukiewicz, Agata
    Dybala, Pawel
    Skrzek, Pawel
    Soliwoda, Kamil
    Fortuna, Marcin
    Leliwa, Gniewosz
    Wroczynski, Michal
    DATA, 2024, 9 (01)
  • [4] VisImages: A Fine-Grained Expert-Annotated Visualization Dataset
    Deng, Dazhen
    Wu, Yihong
    Shu, Xinhuan
    Wu, Jiang
    Fu, Siwei
    Cui, Weiwei
    Wu, Yingcai
    IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS, 2023, 29 (07) : 3298 - 3311
  • [5] ArQuAD: An Expert-Annotated Arabic Machine Reading Comprehension Dataset
    Obeidat, Rasha
    Al-Harbi, Marwa
    Al-Ayyoub, Mahmoud
    Alawneh, Luay
    COGNITIVE COMPUTATION, 2024, 16 (03) : 984 - 1003
  • [6] ANNO-MI: A DATASET OF EXPERT-ANNOTATED COUNSELLING DIALOGUES
    Wu, Zixiu
    Balloccu, Simone
    Kumar, Vivek
    Helaoui, Rim
    Reiter, Ehud
    Recupero, Diego Reforgiato
    Riboni, Daniele
    2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 6177 - 6181
  • [7] MAUD: An Expert-Annotated Legal NLP Dataset for Merger Agreement Understanding
    Wang, Steven H.
    Scardigli, Antoine
    Tang, Leonard
    Chen, Wei
    Levkin, Dimitry
    Chen, Anya
    Ball, Spencer
    Woodside, Thomas
    Zhang, Oliver
    Hendrycks, Dan
    2023 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP 2023), 2023, : 16369 - 16382
  • [8] Creation, Analysis and Evaluation of AnnoMI, a Dataset of Expert-Annotated Counselling Dialogues
    Wu, Zixiu
    Balloccu, Simone
    Kumar, Vivek
    Helaoui, Rim
    Recupero, Diego Reforgiato
    Riboni, Daniele
    FUTURE INTERNET, 2023, 15 (03)
  • [9] Prostate158-An expert-annotated 3T MRI dataset and algorithm for prostate cancer detection
    Adams, Lisa C.
    Makowski, Marcus R.
    Engel, Guenther
    Rattunde, Maximilian
    Busch, Felix
    Asbach, Patrick
    Niehues, Stefan M.
    Vinayahalingam, Shankeeth
    van Ginneken, Bram
    Litjens, Geert
    Bressem, Keno K.
    COMPUTERS IN BIOLOGY AND MEDICINE, 2022, 148
  • [10] Conceptual Questions in Developing Expert-Annotated Data
    Ma, Megan
    Waldon, Brandon
    Nyarko, Julian
    PROCEEDINGS OF THE 19TH INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND LAW, ICAIL 2023, 2023, : 427 - 431