Topic modeling algorithms and applications: A survey

被引:123
作者
Abdelrazek, Aly [1 ]
Eid, Yomna [1 ]
Gawish, Eman [1 ]
Medhat, Walaa [1 ,2 ]
Hassan, Ahmed [1 ]
机构
[1] Nile Univ, Informat Technol & Comp Sci, CIS, Giza, Egypt
[2] Benha Univ, Fac Comp & Artificial intelligence, Banha, Egypt
关键词
Topic modeling; Neural; Probabilistic; Evaluation; LDA; REPRESENTATION;
D O I
10.1016/j.is.2022.102131
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Topic modeling is used in information retrieval to infer the hidden themes in a collection of documents and thus provides an automatic means to organize, understand and summarize large collections of textual information. Topic models also offer an interpretable representation of documents used in several downstream Natural Language Processing (NLP) tasks. Modeling techniques vary from probabilistic graphical models to the more recent neural models. This paper surveys topic models from four aspects. The first aspect categorizes different topic modeling techniques into four categories: algebraic, fuzzy, probabilistic, and neural. We review the wide variety of available models from each category, highlight differences and similarities between models and model categories using a unified perspective, investigate these models' characteristics and limitations, and discuss their proper use cases. The second aspect illustrates six criteria for proper evaluation of topic models, from modeling quality to interpretability, stability, efficiency, and beyond. Topic modeling has found applications in various disciplines, owing to its interpretability. We examine these applications along with some popular software tools which provide an implementation of some models. The fourth aspect reviews available datasets and benchmarks. Using two benchmark datasets, we conducted experiments to compare seven topic models along the proposed metrics. The discussion highlights the differences between the models and their relative suitability for various applications. It notes the relationship between evaluation metrics and proposes four key aspects to help decide which model to use for an application. Our discussion also shows that the research trends move towards developing and tuning neural topic models and leveraging the power of pre-trained language models. Finally, it highlights research gaps in developing unified benchmarks and evaluation metrics. (c) 2022 Elsevier Ltd. All rights reserved.
引用
收藏
页数:17
相关论文
共 110 条
[1]   Providing a Personalization Model Based on Fuzzy Topic Modeling [J].
Abri, Sara ;
Abri, Rayan .
ARABIAN JOURNAL FOR SCIENCE AND ENGINEERING, 2021, 46 (04) :3079-3086
[2]   What is wrong with topic modeling? And how to fix it using search-based software engineering [J].
Agrawal, Amritanshu ;
Fu, Wei ;
Menzies, Tim .
INFORMATION AND SOFTWARE TECHNOLOGY, 2018, 98 :74-88
[3]  
Akhtar N, 2019, ICACDS 2019. CCIS, V1046, P577, DOI DOI 10.1007/978-981-13-9942-854
[4]  
Alghamdi R, 2015, INT J ADV COMPUT SC, V6, P147
[5]   What topic modeling could reveal about the evolution of economics [J].
Ambrosino, Angela ;
Cedrini, Mario ;
Davis, John B. ;
Fiori, Stefano ;
Guerzoni, Marco ;
Nuccio, Massimiliano .
JOURNAL OF ECONOMIC METHODOLOGY, 2018, 25 (04) :329-348
[6]  
Anandkumar A., 2012, TECH REP, DOI [10.21236/ADA604494, DOI 10.21236/ADA604494]
[7]  
Angelov D, 2020, Arxiv, DOI arXiv:2008.09470
[8]  
[Anonymous], 2009, P 26 INT C MACHINE L
[9]  
Armstrong M.D., 2021, P CANADIAN C ARTICIA, DOI DOI 10.21428/594757DB.9-67A9F0
[10]  
Arseniev-Koehler A, 2021, Arxiv, DOI [arXiv:2106.14365, 10.31235/osf.io/nkyaq]