Topic modeling algorithms and applications: A survey

被引：123

作者：

Abdelrazek, Aly ^{[1
]}

Eid, Yomna ^{[1
]}

Gawish, Eman ^{[1
]}

Medhat, Walaa ^{[1
,2
]}

Hassan, Ahmed ^{[1
]}

机构：

[1] Nile Univ, Informat Technol & Comp Sci, CIS, Giza, Egypt

[2] Benha Univ, Fac Comp & Artificial intelligence, Banha, Egypt

来源：

INFORMATION SYSTEMS | 2023年 / 112卷

关键词：

Topic modeling; Neural; Probabilistic; Evaluation; LDA; REPRESENTATION;

D O I：

10.1016/j.is.2022.102131

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Topic modeling is used in information retrieval to infer the hidden themes in a collection of documents and thus provides an automatic means to organize, understand and summarize large collections of textual information. Topic models also offer an interpretable representation of documents used in several downstream Natural Language Processing (NLP) tasks. Modeling techniques vary from probabilistic graphical models to the more recent neural models. This paper surveys topic models from four aspects. The first aspect categorizes different topic modeling techniques into four categories: algebraic, fuzzy, probabilistic, and neural. We review the wide variety of available models from each category, highlight differences and similarities between models and model categories using a unified perspective, investigate these models' characteristics and limitations, and discuss their proper use cases. The second aspect illustrates six criteria for proper evaluation of topic models, from modeling quality to interpretability, stability, efficiency, and beyond. Topic modeling has found applications in various disciplines, owing to its interpretability. We examine these applications along with some popular software tools which provide an implementation of some models. The fourth aspect reviews available datasets and benchmarks. Using two benchmark datasets, we conducted experiments to compare seven topic models along the proposed metrics. The discussion highlights the differences between the models and their relative suitability for various applications. It notes the relationship between evaluation metrics and proposes four key aspects to help decide which model to use for an application. Our discussion also shows that the research trends move towards developing and tuning neural topic models and leveraging the power of pre-trained language models. Finally, it highlights research gaps in developing unified benchmarks and evaluation metrics. (c) 2022 Elsevier Ltd. All rights reserved.

引用

页数：17

共 110 条

[1] Providing a Personalization Model Based on Fuzzy Topic Modeling [J].