Text summarization using a trainable summarizer and latent semantic analysis

被引:151
|
作者
Yeh, JY
Ke, HR
Yang, WP
Meng, IH
机构
[1] Natl Chiao Tung Univ, Dept Comp & Informat Sci, Hsinchu 30050, Taiwan
[2] Natl Chiao Tung Univ, Digital Lib, Hsinchu 30050, Taiwan
[3] Natl Chiao Tung Univ, Informat Sect Lib, Hsinchu 30050, Taiwan
关键词
text summarization; corpus-based approach; latent semantic analysis; text relationship map;
D O I
10.1016/j.ipm.2004.04.003
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
This paper proposes two approaches to address text summarization: modified corpus-based approach (MCBA) and LSA-based T.R.M. approach (LSA + T.R.M.). The first is a trainable summarizer, which takes into account several features, including position, positive keyword, negative keyword, centrality, and the resemblance to the title, to generate summaries. Two new ideas are exploited: (1) sentence positions are ranked to emphasize the significances of different sentence positions, and (2) the score function is trained by the genetic algorithm (GA) to obtain a suitable combination of feature weights. The second uses latent semantic analysis (LSA) to derive the semantic matrix of a document or a corpus and uses semantic sentence representation to construct a semantic text relationship map. We evaluate LSA + T.R.M. both with single documents and at the corpus level to investigate the competence of LSA in text summarization. The two novel approaches were measured at several compression rates on a data corpus composed of 100 political articles. When the compression rate was 30%, an average f-measure of 49% for MCBA, 52% for MCBA + GA, 44% and 40% for LSA + T.R.M. in single-document and corpus level were achieved respectively. (C) 2004 Elsevier Ltd. All rights reserved.
引用
收藏
页码:75 / 95
页数:21
相关论文
共 50 条
  • [1] Chinese text summarization using a trainable summarizer and latent semantic analysis
    Yeh, JY
    Ke, HR
    Yang, WP
    DIGITAL LIBRARIES: PEOPLE, KNOWLEDGE, AND TECHNOLOGY, PROCEEDINGS, 2002, 2555 : 76 - 87
  • [2] Text summarization using Latent Semantic Analysis
    Ozsoy, Makbule Gulcin
    Alpaslan, Ferda Nur
    Cicekli, Ilyas
    JOURNAL OF INFORMATION SCIENCE, 2011, 37 (04) : 405 - 417
  • [3] Automatic text summarization using latent semantic analysis
    I. V. Mashechkin
    M. I. Petrovskiy
    D. S. Popov
    D. V. Tsarev
    Programming and Computer Software, 2011, 37 : 299 - 305
  • [4] Automatic Text Summarization Using Latent Semantic Analysis
    Mashechkin, I. V.
    Petrovskiy, M. I.
    Popov, D. S.
    Tsarev, D. V.
    PROGRAMMING AND COMPUTER SOFTWARE, 2011, 37 (06) : 299 - 305
  • [5] KANNADA TEXT SUMMARIZATION USING LATENT SEMANTIC ANALYSIS
    Geetha, J. K.
    Deepamala, N.
    2015 INTERNATIONAL CONFERENCE ON ADVANCES IN COMPUTING, COMMUNICATIONS AND INFORMATICS (ICACCI), 2015, : 1508 - 1512
  • [6] Automatic Text Summarization of Konkani Texts Using Latent Semantic Analysis
    D'Silva, Jovi
    Sharma, Uzzal
    More, Chaitali
    INTERNATIONAL CONFERENCE ON INNOVATIVE COMPUTING AND COMMUNICATIONS, ICICC 2022, VOL 1, 2023, 473 : 425 - 437
  • [7] A Hybrid Approach of Text Summarization Using Latent Semantic Analysis and Deep Learning
    Shah, Chintan
    Jivani, Anjali
    2018 INTERNATIONAL CONFERENCE ON ADVANCES IN COMPUTING, COMMUNICATIONS AND INFORMATICS (ICACCI), 2018, : 2039 - 2044
  • [8] NLP Based Latent Semantic Analysis for Legal Text Summarization
    Merchant, Kaiz
    Pande, Yash
    2018 INTERNATIONAL CONFERENCE ON ADVANCES IN COMPUTING, COMMUNICATIONS AND INFORMATICS (ICACCI), 2018, : 1803 - 1807
  • [9] A Comprehensive Method for Text Summarization Based on Latent Semantic Analysis
    Wang, Yingjie
    Ma, Jun
    NATURAL LANGUAGE PROCESSING AND CHINESE COMPUTING, NLPCC 2013, 2013, 400 : 394 - 401
  • [10] Hybrid Latent Semantic Analysis and Random Indexing Model for Text Summarization
    Chatterjee, Niladri
    Yadav, Nidhika
    INFORMATION AND COMMUNICATION TECHNOLOGY FOR COMPETITIVE STRATEGIES, 2019, 40 : 149 - 156