Parallel inference for cross-collection latent generalized Dirichlet allocation model and applications

被引：2

作者：

Luo, Zhiwen ^{[1
]}

Amayri, Manar ^{[1
]}

Fan, Wentao ^{[2
,3
]}

Ihou, Koffi Eddy ^{[1
]}

Bouguila, Nizar ^{[1
]}

机构：

[1] Concordia Univ, Concordia Inst Informat Syst Engn CIISE, 1515 St Catherine St West, Montreal, PQ H3G 2W1, Canada

[2] Beijing Normal Univ Hong Kong Baptist Univ BNU HKB, Guangdong Prov Key Lab IRADS, Zhuhai 519088, Peoples R China

[3] Beijing Normal Univ Hong Kong Baptist Univ BNU HKB, Dept Comp Sci, Zhuhai 519088, Peoples R China

来源：

EXPERT SYSTEMS WITH APPLICATIONS | 2024年 / 238卷

基金：

中国国家自然科学基金; 加拿大自然科学与工程研究理事会;

关键词：

Cross-collection model; Generalized Dirichlet; Parallel inference; Graphics processing unit; Topic correlation; Comparative text mining; MIXTURE;

D O I：

10.1016/j.eswa.2023.121720

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Existing cross-collection topic models with document-topic representation encounter performance bottlenecks in large-scale datasets due to their reliance on Dirichlet priors and conventional inference schemes. These constraints become noticeable in models derived from the Latent Dirichlet Allocation (LDA) framework. To address these challenges, this paper introduces the GPU-accelerated cross-collection latent generalized Dirichlet allocation (gccLGDA) model. This innovative approach integrates the benefits of generalized Dirichlet (GD) distribution with the computational prowess of GPU-based parallel inference, offering enhanced cross collection topic modeling. The gccLGDA employs the GD distribution presenting a more flexible prior with a comprehensive covariance structure, enabling a more nuanced capture of relationships between latent topics across different collections. Leveraging GPU for parallel inference, our model promises scalable and efficient training for expansive datasets, making it apt for large-scale data challenges. Through empirical evaluations in comparative text mining and document classification, we demonstrate the enhanced performance of the gccLGDA, highlighting its advantages over existing cross-collection topic models.

引用

页数：15

共 54 条

[1]

[Anonymous], 2009, ICML 09 P 26 ANN INT, P833

[2]

Asuncion Arthur U., 2008, NIPS 2008, P81, DOI DOI 10.5555/2981780.2981791

[3] A variational Bayes model for count data learning and classification [J].

Bakhtiari, Ali Shojaee ;

Bouguila, Nizar .

ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE, 2014, 35 :176-186

[4]

Bakhtiari AS, 2014, LECT NOTES COMPUT SC, V8407, P286, DOI 10.1007/978-3-642-55032-4_28

[5] Effective Extensible Programming: Unleashing Julia on GPUs [J].

Besard, Tim ;

Foket, Christophe ;

De Sutter, Bjorn .

IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, 2019, 30 (04) :827-841

[6]

Blei D., 2005, Advances in Neural Information Processing Systems, V18, P147

[7]

Blei D.M., 2006, P 23 INT C MACH LEAR, P113, DOI [DOI 10.1145/1143844.1143859, 10.1145/1143844.1143859]

[8] Probabilistic Topic Models [J].

Blei, David ;

Carin, Lawrence ;

Dunson, David .

IEEE SIGNAL PROCESSING MAGAZINE, 2010, 27 (06) :55-65

[9] Probabilistic Topic Models [J].

Blei, David M. .

COMMUNICATIONS OF THE ACM, 2012, 55 (04) :77-84

[10] Latent Dirichlet allocation [J].

Blei, DM ;

Ng, AY ;

Jordan, MI .

JOURNAL OF MACHINE LEARNING RESEARCH, 2003, 3 (4-5) :993-1022

← 1 2 3 4 5 6 →