Extractive text summarization using clustering-based topic modeling

被引：9

作者：

Belwal, Ramesh Chandra ^{[1
]}

Rai, Sawan ^{[2
]}

Gupta, Atul ^{[1
]}

机构：

[1] Indian Inst Informat Technol Design & Mfg, Dept Comp Sci & Engn, Jabalpur, India

[2] Bennett Univ, Sch Comp Sci Engn & Technol, Greater Noida, India

来源：

SOFT COMPUTING | 2023年 / 27卷 / 07期

关键词：

Extractive summarization; Topic modeling; Clustering; Semantic measure; SENTENCE FUSION; DOCUMENTS; FRAMEWORK; FEATURES;

D O I：

10.1007/s00500-022-07534-6

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Text summarization is the process of converting the input document into a short form, provided that it preserves the overall meaning associated with it. Primarily, text summarization is achieved in two ways, i.e., abstractive and extractive. Extractive summarizers select a few best sentences out of the input document, while abstractive methods may modify the sentence structure or introduce new sentences. The proposed approach is an extractive text summarization technique, where we have expanded topic modeling specifically to be applied to multiple lower-level specialized entities (i.e., groups) embedded in a single document. Our goal is to overcome the lack of coherence issues found in the summarization techniques. Topic modeling was initially proposed to model text data at the multi-document and word levels without considering sentence modeling. Subsequently, it has been applied at the sentence level and used for the document summarization; however, certain limitations were associated. Topic modeling does not perform as expected when applied to a single document at the sentence level. To address this shortcoming, we have proposed a summarization approach that is incorporated at the individual document and clusters level (instead of the sentence level). We aim to choose the best statement from each group (containing sentences of the same kind) found in the given text. We have tried to select the perfect topic by evaluating the probability distribution of the words and respective topics' at the cluster level. The method is evaluated on two standard datasets and shows significant performance gains over existing text summarization techniques. Compared to other text summarization techniques, the Rouge parameters for automatic evaluation show a considerable improvement in F-measure, precision, and recall of the generated summary. Furthermore, a manual evaluation has demonstrated that the proposed approach outperforms the current state-of-the-art text summarization approaches.

引用

页码：3965 / 3982

页数：18

共 86 条

[1] Machine learning-based multi-documents sentiment-oriented summarization using linguistic treatment [J].

Abdi, Asad ;

Shamsuddin, Siti Mariyam ;

Hasan, Shafaatunnur ;

Piran, Md Jalil .

EXPERT SYSTEMS WITH APPLICATIONS, 2018, 109 :66-85

[2] QMOS: Query-based multi-documents opinion-oriented summarization [J].

Abdi, Asad ;

Shamsuddin, Siti Mariyam ;

Aliguliyev, Ramiz M. .

INFORMATION PROCESSING & MANAGEMENT, 2018, 54 (02) :318-338

[3] Query-based multi-documents summarization using linguistic knowledge and content word expansion [J].

Abdi, Asad ;

Idris, Norisma ;

Alguliyev, Rasim M. ;

Aliguliyev, Ramiz M. .

SOFT COMPUTING, 2017, 21 (07) :1785-1801

[4] Topic and sentiment aware microblog summarization for twitter [J].

Ali, Syed Muhammad ;

Noorian, Zeinab ;

Bagheri, Ebrahim ;

Ding, Chen ;

Al-Obeidat, Feras .

JOURNAL OF INTELLIGENT INFORMATION SYSTEMS, 2020, 54 (01) :129-156

[5] An adaptable fine-grained sentiment analysis for summarization of multiple short online reviews [J].

Amplayo, Reinald Kim ;

Song, Min .

DATA & KNOWLEDGE ENGINEERING, 2017, 110 :54-67

[6]

Arora R., 2008, 08, P91

[7]

Bahdanau D, 2016, Arxiv, DOI [arXiv:1409.0473, 10.48550/arXiv.1409.0473,1409.0473, DOI 10.48550/ARXIV.1409.0473,1409.0473]

[8]

Barrios F., 2015, arXiv, DOI DOI 10.48550/ARXIV.1602.03606

[9]

Barrios F., 2016, arXiv

[10] Sentence fusion for multidocument news summarization [J].

Barzilay, R ;

McKeown, KR .

COMPUTATIONAL LINGUISTICS, 2005, 31 (03) :297-327

← 1 2 3 4 5 6 7 8 9 →