A New Automatic Multi-document Text Summarization using Topic Modeling

被引:11
作者
Roul, Rajendra Kumar [1 ]
Mehrotra, Samarth [2 ]
Pungaliya, Yash [2 ]
Sahoo, Jajati Keshari [3 ]
机构
[1] Thapar Inst Engn & Technol, Dept Comp Sci, Patiala 147004, Punjab, India
[2] BITS Pilani, Dept Comp Sci, KK Birla Goa Campus, Pilani 403726, Goa, India
[3] BITS Pilani, Dept Math, KK Birla Goa Campus, Pilani 403726, Goa, India
来源
DISTRIBUTED COMPUTING AND INTERNET TECHNOLOGY, ICDCIT 2019 | 2019年 / 11319卷
关键词
Extractive; Multi-document; ROUGE; Summarization; Topic modeling;
D O I
10.1007/978-3-030-05366-6_17
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
This paper proposes a novel methodology to generate an extractive text summary from a corpus of documents. Unlike most existing methods, our approach is designed in such a way that the final generated summary covers all the important topics from a corpus of documents. We propose a heuristic method which uses the Latent Dirichlet Allocation technique to identify the optimum number of independent topics present in the corpus. Some of the sentences are identified as the important sentences from each independent topic using a set of word and sentence level features. In order to ensure that the final summary is coherent, we suggest a novel technique to reorder the sentences based on sentence similarity. The use of topic modeling ensures that all the important content from the corpus of documents is captured in the extracted summary which in turn strengthen the summary. Experimental results show that the proposed approach is promising.
引用
收藏
页码:212 / 221
页数:10
相关论文
共 11 条
  • [1] Latent Dirichlet allocation
    Blei, DM
    Ng, AY
    Jordan, MI
    [J]. JOURNAL OF MACHINE LEARNING RESEARCH, 2003, 3 (4-5) : 993 - 1022
  • [2] Word-sentence co-ranking for automatic extractive text summarization
    Fang, Changjian
    Mu, Dejun
    Deng, Zhenghong
    Wu, Zhiang
    [J]. EXPERT SYSTEMS WITH APPLICATIONS, 2017, 72 : 189 - 195
  • [3] Fuglede B, 2004, 2004 IEEE INTERNATIONAL SYMPOSIUM ON INFORMATION THEORY, PROCEEDINGS, P31
  • [4] Ganesan K., 2010, P 23 INT C COMP LING, P340
  • [5] ON INFORMATION AND SUFFICIENCY
    KULLBACK, S
    LEIBLER, RA
    [J]. ANNALS OF MATHEMATICAL STATISTICS, 1951, 22 (01): : 79 - 86
  • [6] Lin C.-Y., 2004, P WORKSH TEXT SUMM A, P74
  • [7] WORDNET - A LEXICAL DATABASE FOR ENGLISH
    MILLER, GA
    [J]. COMMUNICATIONS OF THE ACM, 1995, 38 (11) : 39 - 41
  • [8] Moratanch N, 2017, 2017 INTERNATIONAL CONFERENCE ON COMPUTER, COMMUNICATION AND SIGNAL PROCESSING (ICCCSP), P265
  • [9] Nallapati R, 2017, AAAI CONF ARTIF INTE, P3075
  • [10] Deep Learning in the Domain of Multi-Document Text Summarization
    Roul, Rajendra Kumar
    Sahoo, Jajati Keshari
    Goel, Rohan
    [J]. PATTERN RECOGNITION AND MACHINE INTELLIGENCE, PREMI 2017, 2017, 10597 : 575 - 581