Analysis and tuning of hierarchical topic models based on Renyi entropy approach

被引:0
|
作者
Koltcov S. [1 ]
Ignatenko V. [1 ]
Terpilovskii M. [1 ]
Rosso P. [1 ,2 ]
机构
[1] Laboratory for Social and Cognitive Informatics, National Research University Higher School of Economics, St. Petersburg
[2] Pattern Recognition and Human Language Technology Research Center, Universitat Politècnica de València, Valencia
关键词
Data Mining and Machine Learning; Data Science; Hierarchical topic models; Natural Language and Speech; Optimal number of topics; Renyi entropy; Topic modeling;
D O I
10.7717/PEERJ-CS.608
中图分类号
学科分类号
摘要
Hierarchical topic modeling is a potentially powerful instrument for determining topical structures of text collections that additionally allows constructing a hierarchy representing the levels of topic abstractness. However, parameter optimization in hierarchical models, which includes finding an appropriate number of topics at each level of hierarchy, remains a challenging task. In this paper, we propose an approach based on Renyi entropy as a partial solution to the above problem. First, we introduce a Renyi entropy-based metric of quality for hierarchical models. Second, we propose a practical approach to obtaining the “correct” number of topics in hierarchical topic models and show how model hyperparameters should be tuned for that purpose. We test this approach on the datasets with the known number of topics, as determined by the human mark-up, three of these datasets being in the English language and one in Russian. In the numerical experiments, we consider three different hierarchical models: hierarchical latent Dirichlet allocation model (hLDA), hierarchical Pachinko allocation model (hPAM), and hierarchical additive regularization of topic models (hARTM). We demonstrate that the hLDA model possesses a significant level of instability and, moreover, the derived numbers of topics are far from the true numbers for the labeled datasets. For the hPAM model, the Renyi entropy approach allows determining only one level of the data structure. For hARTM model, the proposed approach allows us to estimate the number of topics for two levels of hierarchy. © 2021 Koltcov et al. All Rights Reserved.
引用
收藏
页码:1 / 35
页数:34
相关论文
共 50 条
  • [31] Security of weak secrets based cryptographic primitives via the Renyi entropy
    Yao, Yanqing
    Li, Zhoujun
    IET INFORMATION SECURITY, 2016, 10 (06) : 442 - 450
  • [32] A thresholding method based on two-dimensional Renyi's entropy
    Sahoo, PK
    Arora, G
    PATTERN RECOGNITION, 2004, 37 (06) : 1149 - 1161
  • [33] Quality Evaluation of Adaptive Optical Image Based on DCT and Renyi Entropy
    Xu Yuannan
    Li Junwei
    Wang Jing
    Deng Rong
    Dong Yanbing
    SELECTED PAPERS FROM CONFERENCES OF THE PHOTOELECTRONIC TECHNOLOGY COMMITTEE OF THE CHINESE SOCIETY OF ASTRONAUTICS 2014, PT II, 2015, 9522
  • [34] A goodness-of-fit test of Student distributions based on Renyi entropy
    Lequesne, Justine
    BAYESIAN INFERENCE AND MAXIMUM ENTROPY METHODS IN SCIENCE AND ENGINEERING (MAXENT 2014), 2015, 1641 : 487 - 494
  • [35] A Multiple Renyi Entropy Based Intrusion Detection System for Connected Vehicles
    Yu, Ki-Soon
    Kim, Sung-Hyun
    Lim, Dae-Woon
    Kim, Young-Sik
    ENTROPY, 2020, 22 (02)
  • [36] Hierarchical Topic Modeling Based on the Combination of Formal Concept Analysis and Singular Value Decomposition
    Smatana, Miroslav
    Butka, Peter
    MULTIMEDIA AND NETWORK INFORMATION SYSTEMS, MISSI 2016, 2017, 506 : 357 - 368
  • [37] On the development of a high-order texture analysis using the PWD and Renyi entropy
    Gabarda, S.
    Cristobal, G.
    ADVANCED SIGNAL PROCESSING ALGORITHMS, ARCHITECTURES, AND IMPLEMENTATIONS XVI, 2006, 6313
  • [38] On the Definition of Diversity Order Based on Renyi Entropy for Frequency Selective Fading Channels
    Chae, Seungyeob
    Rim, Minjoong
    ENTROPY, 2017, 19 (04):
  • [39] A new Renyi entropy-based local image descriptor for object recognition
    Gabarda, S.
    Cristobal, G.
    Rodriguez, P.
    Miravet, C.
    del Cura, J. M.
    OPTICS, PHOTONICS, AND DIGITAL TECHNOLOGIES FOR MULTIMEDIA APPLICATIONS, 2010, 7723
  • [40] Renyi entropy based design of heavy tailed distribution for return of financial assets
    Van Tran, Quang
    Kukal, Jaromir
    PHYSICA A-STATISTICAL MECHANICS AND ITS APPLICATIONS, 2024, 637