Pre-trained language models with domain knowledge for biomedical extractive summarization

被引：40

作者：

Xie Q. ^{[1
]}

Bishop J.A. ^{[1
]}

Tiwari P. ^{[2
]}

Ananiadou S. ^{[1
,3
]}

机构：

[1] National Centre for Text Mining, School of Computer Science, The University of Manchester, Manchester

[2] Department of Computer Science, Aalto University, Espoo

[3] Alan Turing Institute, London

来源：

Knowledge-Based Systems | 2022年 / 252卷

基金：

英国生物技术与生命科学研究理事会;

关键词：

Domain knowledge; Extractive summarization; PICO elements; Pre-trained language models;

D O I：

10.1016/j.knosys.2022.109460

中图分类号：

学科分类号：

摘要：

Biomedical text summarization is a critical task for comprehension of an ever-growing amount of biomedical literature. Pre-trained language models (PLMs) with transformer-based architectures have been shown to greatly improve performance in biomedical text mining tasks. However, existing methods for text summarization generally fine-tune PLMs on the target corpora directly and do not consider how fine-grained domain knowledge, such as PICO elements used in evidence-based medicine, can help to identify the context needed for generating coherent summaries. To fill the gap, we propose KeBioSum, a novel knowledge infusion training framework, and experiment using a number of PLMs as bases, for the task of extractive summarization on biomedical literature. We investigate generative and discriminative training techniques to fuse domain knowledge (i.e., PICO elements) into knowledge adapters and apply adapter fusion to efficiently inject the knowledge adapters into the basic PLMs for fine-tuning the extractive summarization task. Experimental results from the extractive summarization task on three biomedical literature datasets show that existing PLMs (BERT, RoBERTa, BioBERT, and PubMedBERT) are improved by incorporating the KeBioSum knowledge adapters, and our model outperforms the strong baselines. © 2022 The Author(s)

引用

共 50 条

[1] Biomedical-domain pre-trained language model for extractive summarization
Du, Yongping
Li, Qingxiao
Wang, Lulin
He, Yanqing
KNOWLEDGE-BASED SYSTEMS, 2020, 199 (199)
[2] Pre-trained Language Models in Biomedical Domain: A Systematic Survey
Wang, Benyou
Xie, Qianqian
Pei, Jiahuan
Chen, Zhihong
Tiwari, Prayag
Li, Zhao
Fu, Jie
ACM COMPUTING SURVEYS, 2024, 56 (03)
[3] Continual knowledge infusion into pre-trained biomedical language models
Jha, Kishlay
Zhang, Aidong
BIOINFORMATICS, 2022, 38 (02) : 494 - 502
[4] Evaluating the Summarization Comprehension of Pre-Trained Language Models
Chernyshev, D. I.
Dobrov, B. V.
LOBACHEVSKII JOURNAL OF MATHEMATICS, 2023, 44 (08) : 3028 - 3039
[5] Evaluating the Summarization Comprehension of Pre-Trained Language Models
D. I. Chernyshev
B. V. Dobrov
Lobachevskii Journal of Mathematics, 2023, 44 : 3028 - 3039
[6] Knowledge Enhanced Pre-trained Language Model for Product Summarization
Yin, Wenbo
Ren, Junxiang
Wu, Yuejiao
Song, Ruilin
Liu, Lang
Cheng, Zhen
Wang, Sibo
NATURAL LANGUAGE PROCESSING AND CHINESE COMPUTING, NLPCC 2022, PT II, 2022, 13552 : 263 - 273
[7] Knowledge Rumination for Pre-trained Language Models
Yao, Yunzhi
Wang, Peng
Mao, Shengyu
Tan, Chuanqi
Huang, Fei
Chen, Huajun
Zhang, Ningyu
2023 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING, EMNLP 2023, 2023, : 3387 - 3404
[8] Knowledge Inheritance for Pre-trained Language Models
Qin, Yujia
Lin, Yankai
Yi, Jing
Zhang, Jiajie
Han, Xu
Zhang, Zhengyan
Su, Yusheng
Liu, Zhiyuan
Li, Peng
Sun, Maosong
Zhou, Jie
NAACL 2022: THE 2022 CONFERENCE OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: HUMAN LANGUAGE TECHNOLOGIES, 2022, : 3921 - 3937
[9] Low Resource Summarization using Pre-trained Language Models
Munaf, Mubashir
Afzal, Hammad
Mahmood, Khawir
Iltaf, Naima
ACM TRANSACTIONS ON ASIAN AND LOW-RESOURCE LANGUAGE INFORMATION PROCESSING, 2024, 23 (10)
[10] Modeling Content Importance for Summarization with Pre-trained Language Models
Xiao, Liqiang
Lu Wang
Hao He
Jin, Yaohui
PROCEEDINGS OF THE 2020 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP), 2020, : 3606 - 3611

← 1 2 3 4 5 →