Using argumentation to extract key sentences from biomedical abstracts

被引：47

作者：

Ruch, Patrick ^{[1
]}

Boyer, Celia

Chichester, Christine

Tbahriti, Imad

Geissbuehler, Antoine

Fabry, Paul

Gobeill, Julien

Pillet, Violaine

Rebholz-Schuhmann, Dietrich

Lovis, Christian

Veuthey, Anne-Lise

机构：

[1] Univ Hosp Geneva, SIM, Geneva, Switzerland

[2] Swiss Inst Bioinformat, Swis Prot Grp, Geneva, Switzerland

[3] Hlth Net Fdn, Geneva, Switzerland

来源：

INTERNATIONAL JOURNAL OF MEDICAL INFORMATICS | 2007年 / 76卷 / 2-3期

关键词：

machine learning; abstracting and indexing; information storage and retrieval; natural language processing; digital libraries;

D O I：

10.1016/j.ijmedinf.2006.05.002

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

PROBLEM: key word assignment has been largely used in MEDLINE to provide an indicative "gist" of the content of articles and to help retrieving biomedical articles. Abstracts are also used for this purpose. However with usually more than 300 words, MEDLINE abstracts can still be regarded as long documents; therefore we design a system to select a unique key sentence. This key sentence must be indicative of the article's content and we assume that abstract's conclusions are good candidates. We design and assess the performance of an automatic key sentence selector, which classifies sentences into four argumentative moves: PURPOSE, METHODS, RESULTS and CONCLUSION. METHODS: we rely on Bayesian classifiers trained on automatically acquired data. Features representation, selection and weighting are reported and classification effectiveness is evaluated on the four classes using confusion matrices. We also explore the use of simple heuristics to take the position of sentences into account. Recall, precision and F-scores are computed for the CONCLUSION class. For the CONCLUSION class, the F-score reaches 84%. Automatic argumentative classification using Bayesian learners is feasible on MEDLINE abstracts and should help user navigation in such repositories. (c) 2006 Elsevier Ireland Ltd. All rights reserved.

引用

页码：195 / 200

页数：6

共 35 条

[1] ABDOU S, 2005, TREC
[2] Aronson AR, 2000, J AM MED INFORM ASSN, P17
[3] Statistical models for text segmentation
Beeferman, D
Berger, A
Lafferty, J
[J]. MACHINE LEARNING, 1999, 34 (1-3) : 177 - 210
[4] Cohen G., 2005, FLAIRS C, P431
[5] On the optimality of the simple Bayesian classifier under zero-one loss
Domingos, P
Pazzani, M
[J]. MACHINE LEARNING, 1997, 29 (2-3) : 103 - 130
[6] Dumais S., 1998, Proceedings of the 1998 ACM CIKM International Conference on Information and Knowledge Management, P148, DOI 10.1145/288627.288651
[7] Ehrler F, 2005, BMC BIOINFORMATICS, V6, DOI 10.1186/1471-2105-6-S1-S23
[8] HAHN U, 1998, WHY DISCOUSE STRUCTU, P633
[9] HARMAN D, 1991, J AM SOC INFORM SCI, V42, P7, DOI 10.1002/(SICI)1097-4571(199101)42:1<7::AID-ASI2>3.0.CO
[10] 2-P

← 1 2 3 4 →