What Disease Does This Patient Have? A Large-Scale Open Domain Question Answering Dataset from Medical Exams

被引:199
作者
Jin, Di [1 ]
Pan, Eileen [1 ]
Oufattole, Nassim [1 ]
Weng, Wei-Hung [1 ]
Fang, Hanyi [2 ]
Szolovits, Peter [1 ]
机构
[1] MIT, Comp Sci & Artificial Intelligence, 77 Massachusetts Ave, Cambridge, MA 02139 USA
[2] Huazhong Univ Sci & Technol, Tongji Med Coll, Wuhan 430074, Peoples R China
来源
APPLIED SCIENCES-BASEL | 2021年 / 11卷 / 14期
关键词
natural language processing; open-domain question answering; multi-choice question answering; clinical question answering;
D O I
10.3390/app11146421
中图分类号
O6 [化学];
学科分类号
0703 ;
摘要
Open domain question answering (OpenQA) tasks have been recently attracting more and more attention from the natural language processing (NLP) community. In this work, we present the first free-form multiple-choice OpenQA dataset for solving medical problems, MedQA, collected from the professional medical board exams. It covers three languages: English, simplified Chinese, and traditional Chinese, and contains 12,723, 34,251, and 14,123 questions for the three languages, respectively. We implement both rule-based and popular neural methods by sequentially combining a document retriever and a machine comprehension model. Through experiments, we find that even the current best method can only achieve 36.7%, 42.0%, and 70.1% of test accuracy on the English, traditional Chinese, and simplified Chinese questions, respectively. We expect MedQA to present great challenges to existing OpenQA systems and hope that it can serve as a platform to promote much stronger OpenQA models from the NLP community in the future.
引用
收藏
页数:17
相关论文
共 42 条
[1]  
Abacha A.B, 2017, OVERVIEW MED QUESTIO
[2]  
Abacha A.B, 2019, BRIDGING GAP CONSUME, P25
[3]  
Alsentzer Emily, 2019, P 2 CLIN NATURAL LAN, P72, DOI [10.18653/v1/W19-1909, DOI 10.18653/V1/W19-1909]
[4]  
Asai A., 2020, ARXIV191110470
[5]  
Ben Abacha A, 2019, SIGBIOMED WORKSHOP ON BIOMEDICAL NATURAL LANGUAGE PROCESSING (BIONLP 2019), P370
[6]   A question-entailment approach to question answering [J].
Ben Abacha, Asma ;
Demner-Fushman, Dina .
BMC BIOINFORMATICS, 2019, 20 (01)
[7]  
Bojanowski P., 2017, Transactions of the association for computational linguistics, V5, P135, DOI DOI 10.1162/TACL_A_00051
[8]   Reading Wikipedia to Answer Open-Domain Questions [J].
Chen, Danqi ;
Fisch, Adam ;
Weston, Jason ;
Bordes, Antoine .
PROCEEDINGS OF THE 55TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2017), VOL 1, 2017, :1870-1879
[9]  
Clark C, 2018, PROCEEDINGS OF THE 56TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL), VOL 1, P845
[10]  
Clark P., P 30 AAAI C ART INT