Short text topic modelling approaches in the context of big data: taxonomy, survey, and analysis

被引:0
作者
Belal Abdullah Hezam Murshed
Suresha Mallappa
Jemal Abawajy
Mufeed Ahmed Naji Saif
Hasib Daowd Esmail Al-ariki
Hudhaifa Mohammed Abdulwahab
机构
[1] Mysore University,Department of Studies in Computer Science
[2] Amran University,Department of Computer Science, College of Engineering and IT
[3] Deakin University,School of Information Technology, Faculty of Science, Engineering and Built Environment
[4] Sri Jayachamarajendra College of Engineering,Department of Computer Applications
[5] VTU,Department of Computer Networks and Distributed Systems, Al Saeed Faculty for Engineering and IT
[6] Taiz University,Department of Computer Networks Engineering and Technologies
[7] Sana’a Community College,Department of Computer Application
[8] Ramaiah Institute of Technology,undefined
[9] VTU,undefined
来源
Artificial Intelligence Review | 2023年 / 56卷
关键词
Big data; Social media; Short text topic modeling; Data streaming; Coherence; Sparseness; Deep learning topic modeling;
D O I
暂无
中图分类号
学科分类号
摘要
Social media platforms such as (Twitter, Facebook, and Weibo) are being increasingly embraced by individuals, groups, and organizations as a valuable source of information. This social media generated information comes in the form of tweets or posts, and normally characterized as short text, huge, sparse, and low density. Since many real-world applications need semantic interpretation of such short texts, research in Short Text Topic Modeling (STTM) has recently gained a lot of interest to reveal unique and cohesive latent topics. This article examines the current state of the art in STTM algorithms. It presents a comprehensive survey and taxonomy of STTM algorithms for short text topic modelling. The article also includes a qualitative and quantitative study of the STTM algorithms, as well as analyses of the various strengths and drawbacks of STTM techniques. Moreover, a comparative analysis of the topic quality and performance of representative STTM models is presented. The performance evaluation is conducted on two real-world Twitter datasets: the Real-World Pandemic Twitter (RW-Pand-Twitter) dataset and Real-world Cyberbullying Twitter (RW-CB-Twitter) dataset in terms of several metrics such as topic coherence, purity, NMI, and accuracy. Finally, the open challenges and future research directions in this promising field are discussed to highlight the trends of research in STTM. The work presented in this paper is useful for researchers interested in learning state-of-the-art short text topic modelling and researchers focusing on developing new algorithms for short text topic modelling.
引用
收藏
页码:5133 / 5260
页数:127
相关论文
共 533 条
[1]  
Abdel-Hafez A(2013)A survey of user modelling in social media websites Comput Inf Sci 6 59-71
[2]  
Yue Xu(2022)Feature selection techniques in the context of big data: taxonomy and analysis Appl Intell 15 1268-1282
[3]  
Abdulwahab HM(2013)Sensing trending topics in Twitter IEEE Trans Multimed 36 2229-2240
[4]  
Ajitha S(2019)User graph topic model J Intell Fuzzy Syst 8 144-150
[5]  
Saif MAN(2019)Enriching tweets for topic modeling via linking to the wikipedia Int J Eng Technol 1660 012100-14
[6]  
Aiello LM(2020)improve topic modeling algorithms based on twitter hashtags J Phys 3 1-153
[7]  
Petkos G(2020)Using topic modeling methods for short-text data: a comparative analysis Front Artif Intell 6 147-189
[8]  
Martin C(2015)A survey of topic modeling in text mining Int J Adv Comput Sci Appl 14 177-81
[9]  
Corney D(2021)Population and global search improved squirrel search algorithm for feature selection in big data classification Int J Intell Eng Syst 393 66-156
[10]  
Papadopoulos S(2017)A general framework to expand short text for topic modeling Inf Sci 50 138-1022