MedMCQA : A Large-scale Multi-Subject Multi-Choice Dataset for Medical domain Question Answering

被引:0
|
作者
Pal, Ankit [1 ]
Umapathi, Logesh Kumar [1 ]
Sankarasubbu, Malaikannan [1 ]
机构
[1] Saama AI Res, Chennai, Tamil Nadu, India
来源
CONFERENCE ON HEALTH, INFERENCE, AND LEARNING, VOL 174 | 2022年 / 174卷
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
This paper introduces MedMCQA, a new large-scale, Multiple-Choice Question Answering (MCQA) dataset designed to address real-world medical entrance exam questions. More than 194k high-quality AIIMS & NEET PG entrance exam MCQs covering 2.4k healthcare topics and 21 medical subjects are collected with an average token length of 12.77 and high topical diversity. Each sample contains a question, correct answer(s), and other options which requires a deeper language understanding as it tests the 10+ reasoning abilities of a model across a wide range of medical subjects & topics. A detailed explanation of the solution, along with the above information, is provided in this study.
引用
收藏
页码:248 / 260
页数:13
相关论文
共 50 条
  • [1] Winnowing Knowledge for Multi-choice Question Answering
    Li, Yeqiu
    Zou, Bowei
    Li, Zhifeng
    Aw, Ai Ti
    Hong, Yu
    Zhu, Qiaoming
    FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, EMNLP 2021, 2021, : 1157 - 1165
  • [2] Self-Teaching Machines to Read and Comprehend with Large-Scale Multi-Subject Question-Answering Data
    Yu, Dian
    Sun, Kai
    Yu, Dong
    Cardie, Claire
    FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, EMNLP 2021, 2021, : 56 - 68
  • [3] MLEC-QA: A Chinese Multi-Choice Biomedical Question Answering Dataset
    Li, Jing
    Zhong, Shangping
    Chen, Kaizhi
    2021 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP 2021), 2021, : 8862 - 8874
  • [4] Development of a large-scale medical visual question-answering dataset
    Zhang, Xiaoman
    Wu, Chaoyi
    Zhao, Ziheng
    Lin, Weixiong
    Zhang, Ya
    Wang, Yanfeng
    Xie, Weidi
    COMMUNICATIONS MEDICINE, 2024, 4 (01):
  • [5] EXAMS: A Multi-Subject High School Examinations Dataset for Cross-Lingual and Multilingual Question Answering
    Hardalov, Momchil
    Mihaylov, Todor
    Zlatkova, Dimitrina
    Dinkov, Yoan
    Koychev, Ivan
    Nakov, Preslav
    PROCEEDINGS OF THE 2020 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP), 2020, : 5427 - 5444
  • [6] Two layers LSTM with attention for multi-choice question answering in exams
    Li, Yongbin
    INTERNATIONAL CONFERENCE ON FUNCTIONAL MATERIALS AND CHEMICAL ENGINEERING (ICFMCE 2017), 2018, 323
  • [7] A Legal Multi-Choice Question Answering Model Based on BERT and Attention
    Chen, Guibin
    Luo, Xudong
    Zhu, Junlin
    KNOWLEDGE SCIENCE, ENGINEERING AND MANAGEMENT, PT IV, KSEM 2023, 2023, 14120 : 250 - 266
  • [8] Assessing and Optimizing Large Language Models on Spondyloarthritis Multi-Choice Question Answering: Protocol for Enhancement and Assessment
    Wang, Anan
    Wu, Yunong
    Ji, Xiaojian
    Wang, Xiangyang
    Hu, Jiawen
    Zhang, Fazhan
    Zhang, Zhanchao
    Pu, Dong
    Tang, Lulu
    Ma, Shikui
    Liu, Qiang
    Dong, Jing
    He, Kunlun
    Li, Kunpeng
    Teng, Da
    Li, Tao
    JMIR RESEARCH PROTOCOLS, 2024, 13
  • [9] What Disease Does This Patient Have? A Large-Scale Open Domain Question Answering Dataset from Medical Exams
    Jin, Di
    Pan, Eileen
    Oufattole, Nassim
    Weng, Wei-Hung
    Fang, Hanyi
    Szolovits, Peter
    APPLIED SCIENCES-BASEL, 2021, 11 (14):
  • [10] SKR-QA: Semantic ranking and knowledge revise for multi-choice question answering
    Ren, Mucheng
    Huang, Heyan
    Gao, Yang
    NEUROCOMPUTING, 2021, 459 : 142 - 151