Learning-based summarisation of XML documents

被引:0
作者
Massih R. Amini
Anastasios Tombros
Nicolas Usunier
Mounia Lalmas
机构
[1] University Pierre and Marie Curie,Department of Computer Science, Queen Mary
[2] University of London,undefined
来源
Information Retrieval | 2007年 / 10卷
关键词
Text summarisation; XML documents; Machine learning; Ranking functions;
D O I
暂无
中图分类号
学科分类号
摘要
Documents formatted in eXtensible Markup Language (XML) are available in collections of various document types. In this paper, we present an approach for the summarisation of XML documents. The novelty of this approach lies in that it is based on features not only from the content of documents, but also from their logical structure. We follow a machine learning, sentence extraction-based summarisation technique. To find which features are more effective for producing summaries, this approach views sentence extraction as an ordering task. We evaluated our summarisation model using the INEX and SUMMAC datasets. The results demonstrate that the inclusion of features from the logical structure of documents increases the effectiveness of the summariser, and that the learnable system is also effective and well-suited to the task of summarisation in the context of XML documents. Our approach is generic, and is therefore applicable, apart from entire documents, to elements of varying granularity within the XML tree. We view these results as a step towards the intelligent summarisation of XML documents.
引用
收藏
页码:233 / 255
页数:22
相关论文
共 28 条
  • [1] Darroch J. N.(1972)Generalized iterative scaling for log-linear models Annals of Mathematical Statistics 43 1470-1480
  • [2] Ratcliff D.(1990)Indexing by latent semantic analysis Journal of the American Society of Information Science 41 391-407
  • [3] Deerwester S. C.(1969)New methods in automatic extracting Journal of the ACM 16 264-285
  • [4] Dumais S. T.(2003)An efficient boosting algorithm for combining preferences Journal of Machine Learning Research 4 933-969
  • [5] Landauer T. K.(2000)The challenges of automatic summarization IEEE Computer Society 33 29-36
  • [6] Furnas G. W.(1958)The automatic creation of literature abstracts IBM Journal 2 159-165
  • [7] Harshman R. A.(1990)Constructing literature abstracts by computer: techniques and prospects Information Processing & Management 26 171-186
  • [8] Edmundson H.(1999)Improved boosting algorithms using confidence-rated predictions Machine Learning 37 297-336
  • [9] Freund Y.(2005)Ranking and reranking with perceptron Machine Learning, Special Issue on Learning in Speech and Language Technologies 60 73-96
  • [10] Iyer R.(1981)Clustering criteria and multivariate normal mixture Biometrics 37 35-43