Heterogeneous-Length Text Topic Modeling for Reader-Aware Multi-Document Summarization

被引:9
作者
Qiang, Jipeng [1 ]
Chen, Ping [2 ]
Ding, Wei [2 ]
Wang, Tong [2 ]
Xie, Fei [3 ]
Wu, Xindong [4 ,5 ]
机构
[1] Yangzhou Univ, Dept Comp Sci, Yangzhou, Jiangsu, Peoples R China
[2] Univ Massachusetts, Dept Comp Sci, Boston, MA 02125 USA
[3] Hefei Normal Univ, Dept Comp Sci & Technol, Hefei, Anhui, Peoples R China
[4] Mininglamp Acad Sci, Minininglamp, Beijing, Peoples R China
[5] Hefei Univ Technol, Minist Educ, Key Lab Knowledge Engn Big Data, Hefei, Anhui, Peoples R China
基金
中国国家自然科学基金;
关键词
Topic modeling; LDA; heterogeneous-length text; multi-document summarization;
D O I
10.1145/3333030
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
More and more user comments like Tweets are available, which often contain user concerns. In order to meet the demands of users, a good summary generating from multiple documents should consider reader interests as reflected in reader comments. In this article, we focus on how to generate a summary from multi-document documents by considering reader comments, named as reader-aware multi-document summarization (RA-MDS). We present an innovative topic-based method for RA-MDA, which exploits latent topics to obtain the most salient and lessen redundancy summary from multiple documents. Since finding latent topics for RA-MDS is a crucial step. we also present a Heterogeneous-length Text Topic Modeling (HTTM) to extract topics from the corpus that includes both news reports and user comments, denoted as heterogeneous-length texts. In this case, the latent topics extract by HTTM cover not only important aspects of the event, but also aspects that attract reader interests. Comparisons on summary benchmark datasets also confirm that the proposed RA-MDS method is effective in improving the quality of extracted summaries. In addition, experimental results demonstrate that the proposed topic modeling method outperforms existing topic modeling algorithms.
引用
收藏
页数:21
相关论文
共 44 条
  • [1] Multiple documents summarization based on evolutionary optimization algorithm
    Alguliev, Rasim M.
    Aliguliyev, Ramiz M.
    Isazade, Nijat R.
    [J]. EXPERT SYSTEMS WITH APPLICATIONS, 2013, 40 (05) : 1675 - 1689
  • [2] [Anonymous], 1996, Technical report
  • [3] [Anonymous], 1999, FDN STAT NATURAL LAN
  • [4] [Anonymous], 2004, P ACL 04 WORKSH TEXT
  • [5] [Anonymous], 2004, P 10 ACM SIGKDD INT, DOI [10.1145/1014052, DOI 10.1145/1014052, DOI 10.1145/1014052.1014087]
  • [6] [Anonymous], 2010, P16 ACM SIGKDD INT C, DOI DOI 10.1145/1835804.1835901
  • [7] [Anonymous], 2010, J COMPUTING
  • [8] Arora R., 2008, 08, P91
  • [9] Multi-document summarization based on the Yago ontology
    Baralis, Elena
    Cagliero, Luca
    Jabeen, Saima
    Fiori, Alessandro
    Shah, Sajid
    [J]. EXPERT SYSTEMS WITH APPLICATIONS, 2013, 40 (17) : 6976 - 6984
  • [10] Latent Dirichlet allocation
    Blei, DM
    Ng, AY
    Jordan, MI
    [J]. JOURNAL OF MACHINE LEARNING RESEARCH, 2003, 3 (4-5) : 993 - 1022