HIERARCHICAL THEME AND TOPIC MODEL FOR SUMMARIZATION

被引:1
作者
Chien, Jen-Tzung [1 ]
Chang, Ying-Lan [1 ]
机构
[1] Natl Chiao Tung Univ, Dept Elect & Comp Engn, Hsinchu 30010, Taiwan
来源
2013 IEEE INTERNATIONAL WORKSHOP ON MACHINE LEARNING FOR SIGNAL PROCESSING (MLSP) | 2013年
关键词
Topic model; structural learning; Bayesian nonparametrics; document summarization; DIRICHLET;
D O I
10.1109/MLSP.2013.6661943
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
This paper presents a hierarchical summarization model to extract representative sentences from a set of documents. In this study, we select the thematic sentences and identify the topical words based on a hierarchical theme and topic model (H2TM). The latent themes and topics are inferred from document collection. A tree stick-breaking process is proposed to draw the theme proportions for representation of sentences. The structural learning is performed without fixing the number of themes and topics. This H2TM is delicate and flexible to represent words and sentences from heterogeneous documents. Thematic sentences are effectively extracted for document summarization. In the experiments, the proposed H2TM outperforms the other methods in terms of precision, recall and F-measure.
引用
收藏
页数:6
相关论文
共 13 条
  • [1] [Anonymous], ADV NEURAL INFORM PR
  • [2] Blei D.M., 2004, ADV NEURAL INFORM PR
  • [3] Probabilistic Topic Models
    Blei, David
    Carin, Lawrence
    Dunson, David
    [J]. IEEE SIGNAL PROCESSING MAGAZINE, 2010, 27 (06) : 55 - 65
  • [4] The Nested Chinese Restaurant Process and Bayesian Nonparametric Inference of Topic Hierarchies
    Blei, David M.
    Griffiths, Thomas L.
    Jordan, Michael I.
    [J]. JOURNAL OF THE ACM, 2010, 57 (02)
  • [5] Latent Dirichlet allocation
    Blei, DM
    Ng, AY
    Jordan, MI
    [J]. JOURNAL OF MACHINE LEARNING RESEARCH, 2003, 3 (4-5) : 993 - 1022
  • [6] Chang YL, 2009, INT CONF ACOUST SPEE, P1689, DOI 10.1109/ICASSP.2009.4959927
  • [7] Topic-Based Hierarchical Segmentation
    Chien, Jen-Tzung
    Chueh, Chuang-Hua
    [J]. IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2012, 20 (01): : 55 - 66
  • [8] Dirichlet Class Language Models for Speech Recognition
    Chien, Jen-Tzung
    Chueh, Chuang-Hua
    [J]. IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2011, 19 (03): : 482 - 495
  • [9] GOLDSTEIN J, 2000, P ANLP NAACL WORKSH, P40
  • [10] Paisley J., 2011, P INT C MACH LEARN