Analysis and tuning of hierarchical topic models based on Renyi entropy approach

被引:0
|
作者
Koltcov S. [1 ]
Ignatenko V. [1 ]
Terpilovskii M. [1 ]
Rosso P. [1 ,2 ]
机构
[1] Laboratory for Social and Cognitive Informatics, National Research University Higher School of Economics, St. Petersburg
[2] Pattern Recognition and Human Language Technology Research Center, Universitat Politècnica de València, Valencia
关键词
Data Mining and Machine Learning; Data Science; Hierarchical topic models; Natural Language and Speech; Optimal number of topics; Renyi entropy; Topic modeling;
D O I
10.7717/PEERJ-CS.608
中图分类号
学科分类号
摘要
Hierarchical topic modeling is a potentially powerful instrument for determining topical structures of text collections that additionally allows constructing a hierarchy representing the levels of topic abstractness. However, parameter optimization in hierarchical models, which includes finding an appropriate number of topics at each level of hierarchy, remains a challenging task. In this paper, we propose an approach based on Renyi entropy as a partial solution to the above problem. First, we introduce a Renyi entropy-based metric of quality for hierarchical models. Second, we propose a practical approach to obtaining the “correct” number of topics in hierarchical topic models and show how model hyperparameters should be tuned for that purpose. We test this approach on the datasets with the known number of topics, as determined by the human mark-up, three of these datasets being in the English language and one in Russian. In the numerical experiments, we consider three different hierarchical models: hierarchical latent Dirichlet allocation model (hLDA), hierarchical Pachinko allocation model (hPAM), and hierarchical additive regularization of topic models (hARTM). We demonstrate that the hLDA model possesses a significant level of instability and, moreover, the derived numbers of topics are far from the true numbers for the labeled datasets. For the hPAM model, the Renyi entropy approach allows determining only one level of the data structure. For hARTM model, the proposed approach allows us to estimate the number of topics for two levels of hierarchy. © 2021 Koltcov et al. All Rights Reserved.
引用
收藏
页码:1 / 35
页数:34
相关论文
共 50 条
  • [21] Renyi and permutation entropy analysis for assessment of cardiac autonomic neuropathy
    Carricarte-Naranjo, C.
    Cornforth, D. J.
    Sanchez-Rodriguez, L. M.
    Brown, M.
    Estevez, M.
    Machado, A.
    Jelinek, H. F.
    EMBEC & NBC 2017, 2018, 65 : 755 - 758
  • [22] Selective ensemble of SVDDs with Renyi entropy based diversity measure
    Xing, Hong-Jie
    Wang, Xi-Zhao
    PATTERN RECOGNITION, 2017, 61 : 185 - 196
  • [23] Algorithm based on the short-term Renyi entropy and IF estimation for noisy EEG signals analysis
    Lerga, Jonatan
    Saulig, Nicoletta
    Mozetic, Vladimir
    COMPUTERS IN BIOLOGY AND MEDICINE, 2017, 80 : 1 - 13
  • [24] A Topic based Approach for Sentiment Analysis on Twitter Data
    Ficamos, Pierre
    Liu, Yan
    INTERNATIONAL JOURNAL OF ADVANCED COMPUTER SCIENCE AND APPLICATIONS, 2016, 7 (12) : 201 - 205
  • [25] TopChurn: Maximum Entropy Churn Prediction Using Topic Models Over Heterogeneous Signals
    Das, Manirupa
    Elsner, Micha
    Nandi, Arnab
    Ramnath, Rajiv
    WWW'15 COMPANION: PROCEEDINGS OF THE 24TH INTERNATIONAL CONFERENCE ON WORLD WIDE WEB, 2015, : 291 - 297
  • [26] HEART MURMURS DETECTION AND CHARACTERIZATION USING WAVELET ANALYSIS WITH RENYI ENTROPY
    Daoud, Boutana
    Nayad, Kouras
    Braham, Barkat
    Messaoud, Benidir
    JOURNAL OF MECHANICS IN MEDICINE AND BIOLOGY, 2017, 17 (06)
  • [27] Note on the equivalence relationship between Renyi-entropy based and Tsallis-entropy based image thresholding
    Wang, ST
    Chung, FL
    PATTERN RECOGNITION LETTERS, 2005, 26 (14) : 2309 - 2312
  • [28] A Learning Algorithm of Least Squares Support Vector Machine Based on Factor Analysis and Renyi-Entropy
    Zhao Quanhua
    2009 IEEE INTERNATIONAL CONFERENCE ON INTELLIGENT COMPUTING AND INTELLIGENT SYSTEMS, PROCEEDINGS, VOL 1, 2009, : 66 - 73
  • [29] A novel ant-based clustering algorithm using Renyi entropy
    Zhang, Lei
    Cao, Qixin
    Lee, Jay
    APPLIED SOFT COMPUTING, 2013, 13 (05) : 2643 - 2657
  • [30] Ladar imaging detection of salient map based on PWVD and Renyi entropy
    Xu Yuannan
    Zhao Yuan
    Deng Rong
    Dong Yanbing
    MIPPR 2013: MULTISPECTRAL IMAGE ACQUISITION, PROCESSING, AND ANALYSIS, 2013, 8917