Can LLMs Augment Low-Resource Reading Comprehension Datasets? Opportunities and Challenges

被引:0
|
作者
Samuel, Vinay [1 ]
Aynaou, Houda [2 ]
Chowdhury, Arijit Ghosh [3 ]
Ramanan, Karthik Venkat [3 ]
Chadha, Aman [4 ]
机构
[1] Carnegie Mellon Univ, Pittsburgh, PA 15213 USA
[2] Georgia Tech Univ, Atlanta, GA USA
[3] Univ Illinois, Urbana, IL USA
[4] Amazon GenAI, Cupertino, CA USA
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Large Language Models (LLMs) have demonstrated impressive zero-shot performance on a wide range of NLP tasks, demonstrating the ability to reason and apply common sense. A relevant application is to use them for creating high-quality synthetic datasets for downstream tasks. In this work, we probe whether GPT-4 can be used to augment existing extractive reading comprehension datasets. Automating data annotation processes has the potential to save large amounts of time, money, and effort that goes into manually labeling datasets. In this paper, we evaluate the performance of GPT-4 as a replacement for human annotators for low-resource reading comprehension tasks, by comparing performance after fine-tuning, and the cost associated with annotation. This work serves to be the first analysis of LLMs as synthetic data augmenters for QA systems, highlighting the unique opportunities and challenges. Additionally, we release augmented versions of low-resource datasets, that will allow the research community to create further benchmarks for evaluation of generated datasets. Github available at https://github.com/vsamuel2003/qa-gpt4
引用
收藏
页码:325 / 335
页数:11
相关论文
共 50 条
  • [21] Deep Learning for Audio Event Detection and Tagging on Low-Resource Datasets
    Morfi, Veronica
    Stowell, Dan
    APPLIED SCIENCES-BASEL, 2018, 8 (08):
  • [22] Digital pathology - implementation challenges in low-resource countries
    Fontelo, Paul
    Faustorilla, John
    Gavino, Alex
    Marcelo, Alvin
    ANALYTICAL CELLULAR PATHOLOGY, 2012, 35 (01) : 31 - 36
  • [23] Robust Educational Dialogue Act Classifiers with Low-Resource and Imbalanced Datasets
    Lin, Jionghao
    Tan, Wei
    Nguyen, Ngoc Dang
    Lang, David
    Du, Lan
    Buntine, Wray
    Beare, Richard
    Chen, Guanliang
    Gasevic, Dragan
    ARTIFICIAL INTELLIGENCE IN EDUCATION, AIED 2023, 2023, 13916 : 114 - 125
  • [24] Disorders of sex development: Challenges in a low-resource country
    Ehua, A. M.
    Moulot, M. O.
    Agbara, K. S.
    Enache, T.
    Bankole, S. R.
    ARCHIVES DE PEDIATRIE, 2023, 30 (01): : 10 - 13
  • [25] LLMs in the Loop: Leveraging Large Language Model Annotations for Active Learning in Low-Resource Languages
    Kholodna, Nataliia
    Julka, Sahib
    Khodadadi, Mohammad
    Gumus, Muhammed Nurullah
    Granitzer, Michael
    MACHINE LEARNING AND KNOWLEDGE DISCOVERY IN DATABASES-APPLIED DATA SCIENCE TRACK, PT X, ECML PKDD 2024, 2024, 14950 : 397 - 412
  • [26] Can LLMs Grade Open Response Reading Comprehension Questions? An Empirical Study Using the ROARs Dataset
    Henkel, Owen
    Hills, Libby
    Roberts, Bill
    McGrane, Joshua
    INTERNATIONAL JOURNAL OF ARTIFICIAL INTELLIGENCE IN EDUCATION, 2024,
  • [27] Challenges and Opportunities for Implementation of Interventions to Prevent and Control CVD in Low-Resource Settings A Report From CESCAS in Argentina
    Rubinstein, Adolfo L.
    Irazola, Vilma E.
    Poggio, Rosana
    Gulayin, Pablo
    Nejamis, Analia
    Beratarrechea, Andrea
    GLOBAL HEART, 2015, 10 (01) : 21 - 29
  • [28] Challenges of access to kidney care for children in low-resource settings
    McCulloch, Mignon
    Luyckx, Valerie A.
    Cullis, Brett
    Davies, Simon J.
    Finkelstein, Fredric O.
    Yap, Hui Kim
    Feehally, John
    Smoyer, William E.
    NATURE REVIEWS NEPHROLOGY, 2021, 17 (01) : 33 - 45
  • [29] Challenges Facing Medical Data Digitization in Low-Resource Contexts
    Shovlin, Alex
    Ghen, Mike
    Simpson, Peter
    Mehta, Khanjan
    PROCEEDINGS OF THE THIRD 2013 IEEE GLOBAL HUMANITARIAN TECHNOLOGY CONFERENCE (GHTC 2013), 2013, : 365 - +
  • [30] Automatic Transcription Challenges for Inuktitut, a Low-Resource Polysynthetic Language
    Gupta, Vishwa
    Boulianne, Gilles
    PROCEEDINGS OF THE 12TH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION (LREC 2020), 2020, : 2521 - 2527