Parallel inference for cross-collection latent generalized Dirichlet allocation model and applications

被引:2
作者
Luo, Zhiwen [1 ]
Amayri, Manar [1 ]
Fan, Wentao [2 ,3 ]
Ihou, Koffi Eddy [1 ]
Bouguila, Nizar [1 ]
机构
[1] Concordia Univ, Concordia Inst Informat Syst Engn CIISE, 1515 St Catherine St West, Montreal, PQ H3G 2W1, Canada
[2] Beijing Normal Univ Hong Kong Baptist Univ BNU HKB, Guangdong Prov Key Lab IRADS, Zhuhai 519088, Peoples R China
[3] Beijing Normal Univ Hong Kong Baptist Univ BNU HKB, Dept Comp Sci, Zhuhai 519088, Peoples R China
基金
中国国家自然科学基金; 加拿大自然科学与工程研究理事会;
关键词
Cross-collection model; Generalized Dirichlet; Parallel inference; Graphics processing unit; Topic correlation; Comparative text mining; MIXTURE;
D O I
10.1016/j.eswa.2023.121720
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Existing cross-collection topic models with document-topic representation encounter performance bottlenecks in large-scale datasets due to their reliance on Dirichlet priors and conventional inference schemes. These constraints become noticeable in models derived from the Latent Dirichlet Allocation (LDA) framework. To address these challenges, this paper introduces the GPU-accelerated cross-collection latent generalized Dirichlet allocation (gccLGDA) model. This innovative approach integrates the benefits of generalized Dirichlet (GD) distribution with the computational prowess of GPU-based parallel inference, offering enhanced cross collection topic modeling. The gccLGDA employs the GD distribution presenting a more flexible prior with a comprehensive covariance structure, enabling a more nuanced capture of relationships between latent topics across different collections. Leveraging GPU for parallel inference, our model promises scalable and efficient training for expansive datasets, making it apt for large-scale data challenges. Through empirical evaluations in comparative text mining and document classification, we demonstrate the enhanced performance of the gccLGDA, highlighting its advantages over existing cross-collection topic models.
引用
收藏
页数:15
相关论文
共 54 条
[21]   Online variational learning of generalized Dirichlet mixture models with feature selection [J].
Fan, Wentao ;
Bouguila, Nizar .
NEUROCOMPUTING, 2014, 126 :166-179
[22]   Variational learning of a Dirichlet process of generalized Dirichlet distributions for simultaneous clustering and feature selection [J].
Fan, Wentao ;
Bouguila, Nizar .
PATTERN RECOGNITION, 2013, 46 (10) :2754-2769
[23]  
Foulds J, 2013, 19TH ACM SIGKDD INTERNATIONAL CONFERENCE ON KNOWLEDGE DISCOVERY AND DATA MINING (KDD'13), P446
[24]   Topics in semantic representation [J].
Griffiths, Thomas L. ;
Steyvers, Mark ;
Tenenbaum, Joshua B. .
PSYCHOLOGICAL REVIEW, 2007, 114 (02) :211-244
[25]   Finding scientific topics [J].
Griffiths, TL ;
Steyvers, M .
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 2004, 101 :5228-5235
[26]  
Hoffman M., 2010, Advances in Neural Information Processing Systems, V23, DOI DOI 10.5555/2997189.2997285
[27]  
Hoffman MD, 2013, J MACH LEARN RES, V14, P1303
[28]   Probabilistic latent semantic indexing [J].
Hofmann, T .
SIGIR'99: PROCEEDINGS OF 22ND INTERNATIONAL CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL, 1999, :50-57
[29]   Probabilistic Topic Modeling for Comparative Analysis of Document Collections [J].
Hua, Ting ;
Lu, Chang-Tien ;
Choo, Jaegul ;
Reddy, Chandan K. .
ACM TRANSACTIONS ON KNOWLEDGE DISCOVERY FROM DATA, 2020, 14 (02)
[30]   Stochastic topic models for large scale and nonstationary data [J].
Ihou, Koffi Eddy ;
Bouguila, Nizar .
ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE, 2020, 88