Can LLMs Augment Low-Resource Reading Comprehension Datasets? Opportunities and Challenges

被引:0
|
作者
Samuel, Vinay [1 ]
Aynaou, Houda [2 ]
Chowdhury, Arijit Ghosh [3 ]
Ramanan, Karthik Venkat [3 ]
Chadha, Aman [4 ]
机构
[1] Carnegie Mellon Univ, Pittsburgh, PA 15213 USA
[2] Georgia Tech Univ, Atlanta, GA USA
[3] Univ Illinois, Urbana, IL USA
[4] Amazon GenAI, Cupertino, CA USA
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Large Language Models (LLMs) have demonstrated impressive zero-shot performance on a wide range of NLP tasks, demonstrating the ability to reason and apply common sense. A relevant application is to use them for creating high-quality synthetic datasets for downstream tasks. In this work, we probe whether GPT-4 can be used to augment existing extractive reading comprehension datasets. Automating data annotation processes has the potential to save large amounts of time, money, and effort that goes into manually labeling datasets. In this paper, we evaluate the performance of GPT-4 as a replacement for human annotators for low-resource reading comprehension tasks, by comparing performance after fine-tuning, and the cost associated with annotation. This work serves to be the first analysis of LLMs as synthetic data augmenters for QA systems, highlighting the unique opportunities and challenges. Additionally, we release augmented versions of low-resource datasets, that will allow the research community to create further benchmarks for evaluation of generated datasets. Github available at https://github.com/vsamuel2003/qa-gpt4
引用
收藏
页码:325 / 335
页数:11
相关论文
共 50 条
  • [1] Effective Strategies for Low-Resource Reading Comprehension
    Jing, Yimin
    Xiong, Deyi
    2020 INTERNATIONAL CONFERENCE ON ASIAN LANGUAGE PROCESSING (IALP 2020), 2020, : 153 - 157
  • [2] Machine Reading Comprehension Model for Low-Resource Languages and Experimenting on Vietnamese
    Bach Hoang Tien Nguyen
    Dung Manh Nguyen
    Trang Thi Thu Nguyen
    ADVANCES AND TRENDS IN ARTIFICIAL INTELLIGENCE: THEORY AND PRACTICES IN ARTIFICIAL INTELLIGENCE, 2022, 13343 : 370 - 381
  • [3] Surgical mentorship in low-resource environments: Opportunities and challenges, a perspective
    Awuah, Wireko A.
    Tan, Joecelyn K.
    Bharadwaj, Hareesha R.
    Aderinto, Nicholas
    Ferreira, Tomas
    Patel, Heli
    Shah, Muhammad H.
    Kapoor, Abhay A.
    Banerjee, Sumitaksha
    Abdul-Rahman, Toufik
    Atallah, Oday
    HEALTH SCIENCE REPORTS, 2024, 7 (08)
  • [4] ONLINE READING COMPREHENSION: CHALLENGES AND OPPORTUNITIES
    Coiro, Julie
    TEXTO LIVRE-LINGUAGEM E TECNOLOGIA, 2014, 7 (02): : 30 - 43
  • [5] Curriculum Learning Driven Domain Adaptation for Low-Resource Machine Reading Comprehension
    Zhang, Licheng
    Wang, Quan
    Xu, Benfeng
    Liu, Yi
    Mao, Zhendong
    IEEE SIGNAL PROCESSING LETTERS, 2024, 31 : 2650 - 2654
  • [6] A Query-Parallel Machine Reading Comprehension Framework for Low-resource NER
    Zhang, Yuhao
    Wang, Yongliang
    FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS - EMNLP 2023, 2023, : 2052 - 2065
  • [7] Domain adaptive multi-task transformer for low-resource machine reading comprehension
    Bai, Ziwei
    Wang, Baoxun
    Wang, Zongsheng
    Yuan, Caixia
    Wang, Xiaojie
    NEUROCOMPUTING, 2022, 509 : 46 - 55
  • [8] To Augment or Not to Augment? A Comparative Study on Text Augmentation Techniques for Low-Resource NLP
    Sahin, Gozde Gul
    COMPUTATIONAL LINGUISTICS, 2022, 48 (01) : 5 - 42
  • [9] Global Nursing in Low-Resource and Middle-Resource Countries Challenges and Opportunities in Perioperative Practice
    Pettorini, Kate
    Gullatte, Mary M.
    ORAL AND MAXILLOFACIAL SURGERY CLINICS OF NORTH AMERICA, 2020, 32 (03) : 437 - +
  • [10] Opportunities and Challenges of Automatic Speech Recognition Systems for Low-Resource Language Speakers
    Reitmaier, Thomas
    Wallington, Electra
    Raju, Dani Kalarikalayil
    Klejch, Ondrej
    Pearson, Jennifer
    Jones, Matt
    Bell, Peter
    Robinson, Simon
    PROCEEDINGS OF THE 2022 CHI CONFERENCE ON HUMAN FACTORS IN COMPUTING SYSTEMS (CHI' 22), 2022,