A dataset for plain language adaptation of biomedical abstracts

被引:9
作者
Attal, Kush [1 ]
Ondov, Brian [1 ]
Demner-Fushman, Dina [1 ]
机构
[1] US Natl Lib Med, Lister Hill Natl Ctr Biomed Commun, NIH, Bethesda, MD 20894 USA
基金
美国国家卫生研究院;
关键词
D O I
10.1038/s41597-022-01920-3
中图分类号
O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];
学科分类号
07 ; 0710 ; 09 ;
摘要
Though exponentially growing health-related literature has been made available to a broad audience online, the language of scientific articles can be difficult for the general public to understand. Therefore, adapting this expert-level language into plain language versions is necessary for the public to reliably comprehend the vast health-related literature. Deep Learning algorithms for automatic adaptation are a possible solution; however, gold standard datasets are needed for proper evaluation. Proposed datasets thus far consist of either pairs of comparable professional- and general public-facing documents or pairs of semantically similar sentences mined from such documents. This leads to a trade-off between imperfect alignments and small test sets. To address this issue, we created the Plain Language Adaptation of Biomedical Abstracts dataset. This dataset is the first manually adapted dataset that is both document- and sentence-aligned. The dataset contains 750 adapted abstracts, totaling 7643 sentence pairs. Along with describing the dataset, we benchmark automatic adaptation on the dataset with state-of-the-art Deep Learning approaches, setting baselines for future research.
引用
收藏
页数:11
相关论文
共 49 条
[1]  
Adduru V., 2018, KHD@ IJCAI
[2]   Automated Text Simplification: A Survey [J].
Al-Thanyyan, Suha S. ;
Azmi, Aqil M. .
ACM COMPUTING SURVEYS, 2021, 54 (02)
[3]  
[Anonymous], MEDLINEPLUS HLTH INF
[4]  
[Anonymous], 2015, Transactions of the Association for Computational Linguistics, DOI 10.1162/tacl_a_00139
[5]  
[Anonymous], COCHRANE HDB SYSTEMA
[6]  
Attal-Kush, 2022, ZENODO, DOI [10.5281/ZENODO.7429310, DOI 10.5281/ZENODO.7429310]
[7]  
Basu C., 2021, HUMAN AAAI FALL S
[8]  
Cao YX, 2020, 58TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2020), P1061
[9]  
Cardon R., 2019, P INT C REC ADV NAT
[10]   Closing the gap: Addressing the vocabulary needs of English-language learners in bilingual and mainstream classrooms [J].
Carlo, MS ;
August, D ;
McLaughlin, B ;
Snow, CE ;
Dressler, C ;
Lippman, DN ;
Lively, TJ ;
White, CE .
READING RESEARCH QUARTERLY, 2004, 39 (02) :188-215