Extractive text summarization using clustering-based topic modeling

被引:9
作者
Belwal, Ramesh Chandra [1 ]
Rai, Sawan [2 ]
Gupta, Atul [1 ]
机构
[1] Indian Inst Informat Technol Design & Mfg, Dept Comp Sci & Engn, Jabalpur, India
[2] Bennett Univ, Sch Comp Sci Engn & Technol, Greater Noida, India
关键词
Extractive summarization; Topic modeling; Clustering; Semantic measure; SENTENCE FUSION; DOCUMENTS; FRAMEWORK; FEATURES;
D O I
10.1007/s00500-022-07534-6
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Text summarization is the process of converting the input document into a short form, provided that it preserves the overall meaning associated with it. Primarily, text summarization is achieved in two ways, i.e., abstractive and extractive. Extractive summarizers select a few best sentences out of the input document, while abstractive methods may modify the sentence structure or introduce new sentences. The proposed approach is an extractive text summarization technique, where we have expanded topic modeling specifically to be applied to multiple lower-level specialized entities (i.e., groups) embedded in a single document. Our goal is to overcome the lack of coherence issues found in the summarization techniques. Topic modeling was initially proposed to model text data at the multi-document and word levels without considering sentence modeling. Subsequently, it has been applied at the sentence level and used for the document summarization; however, certain limitations were associated. Topic modeling does not perform as expected when applied to a single document at the sentence level. To address this shortcoming, we have proposed a summarization approach that is incorporated at the individual document and clusters level (instead of the sentence level). We aim to choose the best statement from each group (containing sentences of the same kind) found in the given text. We have tried to select the perfect topic by evaluating the probability distribution of the words and respective topics' at the cluster level. The method is evaluated on two standard datasets and shows significant performance gains over existing text summarization techniques. Compared to other text summarization techniques, the Rouge parameters for automatic evaluation show a considerable improvement in F-measure, precision, and recall of the generated summary. Furthermore, a manual evaluation has demonstrated that the proposed approach outperforms the current state-of-the-art text summarization approaches.
引用
收藏
页码:3965 / 3982
页数:18
相关论文
共 86 条
[61]  
Naveen GK, 2014, P 2014 INT C INT ADV, P1
[62]  
Neto JL, 2002, LECT NOTES ARTIF INT, V2507, P205
[63]  
Nobata C, 2001, NTCIR
[64]   Comparative Evaluation of Term-Weighting Methods for Automatic Summarization [J].
Orasan, Constantin .
JOURNAL OF QUANTITATIVE LINGUISTICS, 2009, 16 (01) :67-95
[65]   Applying regression models to query-focused multi-document summarization [J].
Ouyang, You ;
Li, Wenjie ;
Li, Sujian ;
Lu, Qin .
INFORMATION PROCESSING & MANAGEMENT, 2011, 47 (02) :227-237
[66]  
Oya T., 2014, P 8 INT NAT LANG GEN, P45
[67]   Text summarization using Latent Semantic Analysis [J].
Ozsoy, Makbule Gulcin ;
Alpaslan, Ferda Nur ;
Cicekli, Ilyas .
JOURNAL OF INFORMATION SCIENCE, 2011, 37 (04) :405-417
[68]   Developing Artwork Pricing Models for Online Art Sales Using Text Analytics [J].
Powell, Laurel ;
Gelich, Anna ;
Ras, Zbigniew W. .
ROUGH SETS, IJCRS 2019, 2019, 11499 :480-494
[69]  
Qazvinian V, 2008, ARXIV
[70]   Improvement of query-based text summarization using word sense disambiguation [J].
Rahman, Nazreena ;
Borah, Bhogeswar .
COMPLEX & INTELLIGENT SYSTEMS, 2020, 6 (01) :75-85