Latent Dirichlet Allocation and Singular Value Decomposition based Multi-Document Summarization

被引:22
作者
Arora, Rachit [1 ]
Ravindran, Balaraman [1 ]
机构
[1] Indian Inst Technol Madras, Madras 600036, Tamil Nadu, India
来源
ICDM 2008: EIGHTH IEEE INTERNATIONAL CONFERENCE ON DATA MINING, PROCEEDINGS | 2008年
关键词
D O I
10.1109/ICDM.2008.55
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Multi-Document Summarization deals with computing a summary for a set of related articles such that the), give the user a general view about the events. One of the objectives is that the sentences should cover the different events in the documents with the information covered in as few sentences as possible. Latent Dirichlet Allocation can break down these documents into different topics or events. However to reduce the common information content the sentences of the summary need to be orthogonal to each other since orthogonal vectors have the lowest possible similarity and correlation between them. Singular Value Decomposition is used to get the orthogonal representations of vectors and representing sentences as vectors, we can get the sentences that are orthogonal to each other in the LDA mixture model weighted term domain. Thus using LDA we find the different topics in the documents and using SVD we find the sentences that best represent these topics. Finally we present the evaluation of the algorithms on the DUC 2002 Corpus multi-document summarization tasks using the ROUGE evaluator to evaluate the summaries. Compared to DUC 2002 winners, our algorithms gave significantly better ROUGE-1 recall measures.
引用
收藏
页码:713 / 718
页数:6
相关论文
共 16 条
[1]  
[Anonymous], P WORKSH AUT SUMM
[2]  
[Anonymous], P WORKSH AUT SUMM
[3]  
[Anonymous], 2004, P ACL
[4]  
ARORA R, 2008, P 2 WORKSH AN NOIS U
[5]   Using linear algebra for intelligent information retrieval [J].
Berry, MW ;
Dumais, ST ;
OBrien, GW .
SIAM REVIEW, 1995, 37 (04) :573-595
[6]   Latent Dirichlet allocation [J].
Blei, DM ;
Ng, AY ;
Jordan, MI .
JOURNAL OF MACHINE LEARNING RESEARCH, 2003, 3 (4-5) :993-1022
[7]  
Fei-Fei L, 2005, PROC CVPR IEEE, P524
[8]  
LI AMW, 2007, P C UNC ART INT
[9]  
LI W, 2006, P 23 INT C MACH LEAR, V23, P577
[10]  
Lin C.Y., 2004, P WORKSH TEXT SUMM B