Summarization of biomedical articles using domain-specific word embeddings and graph ranking

被引：25

作者：

Moradi, Milad ^{[1
]}

Dashti, Maedeh ^{[2
]}

Samwald, Matthias ^{[1
]}

机构：

[1] Med Univ Vienna, Ctr Med Stat Informat & Intelligent Syst, Inst Artificial Intelligence & Decis Support, Vienna, Austria

[2] Islamic Azad Univ, Dept Comp Sci, Esfahan, Iran

来源：

JOURNAL OF BIOMEDICAL INFORMATICS | 2020年 / 107卷

关键词：

Natural language processing; Medical text mining; Text summarization; Word embedding; Graph ranking; Deep learning; TEXT;

D O I：

10.1016/j.jbi.2020.103452

中图分类号：

TP39 [计算机的应用];

学科分类号：

081203 ; 0835 ;

摘要：

Text summarization tools can help biomedical researchers and clinicians reduce the time and effort needed for acquiring important information from numerous documents. It has been shown that the input text can be modeled as a graph, and important sentences can be selected by identifying central nodes within the graph. However, the effective representation of documents, quantifying the relatedness of sentences, and selecting the most informative sentences are main challenges that need to be addressed in graph-based summarization. In this paper, we address these challenges in the context of biomedical text summarization. We evaluate the efficacy of a graph-based summarizer using different types of context-free and contextualized embeddings. The word representations are produced by pre-training neural language models on large corpora of biomedical texts. The summarizer models the input text as a graph in which the strength of relations between sentences is measured using the domain specific vector representations. We also assess the usefulness of different graph ranking techniques in the sentence selection step of our summarization method. Using the common Recall-Oriented Understudy for Gisting Evaluation (ROUGE) metrics, we evaluate the performance of our summarizer against various comparison methods. The results show that when the summarizer utilizes proper combinations of context-free and contextualized embeddings, along with an effective ranking method, it can outperform the other methods. We demonstrate that the best settings of our graph-based summarizer can efficiently improve the informative content of summaries and decrease the redundancy.

引用

页数：11

共 61 条

[1] Summarization from medical documents: a survey [J].

Afantenos, S ;

Karkaletsis, V ;

Stamatopoulos, P .

ARTIFICIAL INTELLIGENCE IN MEDICINE, 2005, 33 (02) :157-177

[2]

[Anonymous], 2008, TRAITEMENT AUTOMATIQ

[3]

[Anonymous], KNOWLEDGE INFORM SYS

[4]

[Anonymous], 2021, The Oxford handbook of computational linguistics

[5] GRAPHSUM: Discovering correlations among multiple terms for graph-based summarization [J].

Baralis, Elena ;

Cagliero, Luca ;

Mahoto, Naeem ;

Fiori, Alessandro .

INFORMATION SCIENCES, 2013, 249 :96-109

[6] Neural sentence embedding models for semantic similarity estimation in the biomedical domain [J].

Blagec, Kathrin ;

Xu, Hong ;

Agibetov, Asan ;

Samwald, Matthias .

BMC BIOINFORMATICS, 2019, 20 (1)

[7] The anatomy of a large-scale hypertextual Web search engine [J].

Brin, S ;

Page, L .

COMPUTER NETWORKS AND ISDN SYSTEMS, 1998, 30 (1-7) :107-117

[8]

Cer D, 2018, CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP 2018): PROCEEDINGS OF SYSTEM DEMONSTRATIONS, P169

[9]

Dang H.T., 2008, Proceedings of text analysis conference, P1

[10] Formative evaluation of a patient-specific clinical knowledge summarization tool [J].

Del Fiol, Guilherme ;

Mostafa, Javed ;

Pu, Dongqiuye ;

Medlin, Richard ;

Slager, Stacey ;

Jonnalagadda, Siddhartha R. ;

Weir, Charlene R. .

INTERNATIONAL JOURNAL OF MEDICAL INFORMATICS, 2016, 86 :126-134

← 1 2 3 4 5 6 7 →