Short text optimized topic model for service clustering

被引:0
作者
Lu J.-W. [1 ,2 ]
Zheng J.-H. [1 ]
Li D.-N. [1 ]
Xu J. [1 ]
Xiao G. [1 ,2 ]
机构
[1] College of Computer Science and Technology, Zhejiang University of Technology, Hangzhou
[2] College of Mechanical and Electrical Engineering, China Jiliang University, Hangzhou
来源
Zhejiang Daxue Xuebao (Gongxue Ban)/Journal of Zhejiang University (Engineering Science) | 2022年 / 56卷 / 12期
关键词
representative biterm; service clustering; short text optimization; topic model; word embedding;
D O I
10.3785/j.issn.1008-973X.2022.12.010
中图分类号
学科分类号
摘要
A biterm topic model with word vector and noise filtering (BTM-VN) was proposed, in order to mine high-quality latent topics, improve the accuracy of service clustering, and solve sparsity and noise problems caused by the short text feature of service description documents, Based on biterms, BTM-VN expanded the service description documents and obtained additional semantic information. A strategy for calculating the probability of representative biterms based on topic distribution information was designed. By calculating a representative biterms matrix in the sampling process, the weight of the representative biterms at the current topic was improved to reduce the interference of noise words in the service description document. Moreover, word embeddings were integrated to filter the biterms, reducing the number of biterms with low co-occurrence meaning and solving the biterm-based topic model’s problem which causes high time consumption. Finally, an optimized density peak clustering algorithm was used to cluster the topic distribution matrix trained by BTM-VN. Experimental results show that, the service clustering method based on BTM-VN performs better on real-world dataset than existing methods according to three clustering evaluation metrics. © 2022 Zhejiang University. All rights reserved.
引用
收藏
页码:2416 / 2425+2444
相关论文
共 25 条
[1]  
CAO Bu-qing, XIAO Qiao-xiang, ZHANG Xiang-ping, Et al., An API service recommendation method via combining self-organization map-based functionality clustering and deep factorization machine-based quality prediction [J], Chinese Journal of Computers, 42, 6, pp. 1367-1383, (2019)
[2]  
RUPASINGHA R A H M, PAIK I, KUMARA B T G S., Specificity-aware ontology generation for improving Web service clustering [J], IEICE Transactions on Information and Systems, E101.D, 8, pp. 2035-2043, (2018)
[3]  
SHI Min, LIU Jian-Xun, ZHOU Dong, Et al., Multi-relational topic model-based approach for Web services clustering [J], Chinese Journal of Computers, 42, 4, pp. 820-836, (2019)
[4]  
CHEN J, GONG Z, LIU W., A nonparametric model for online topic discovery with word embeddings [J], Information Sciences, 504, pp. 32-47, (2019)
[5]  
BLEI D M, NG A Y, JORDAN M I., Latent dirichlet allocation [J], Journal of Machine Learning Research, 3, pp. 993-1022, (2003)
[6]  
YAN X, GUO J, LAN Y, Et al., A biterm topic model for short texts [C], Proceedings of the 22nd International Conference on World Wide Web, pp. 1445-1456, (2013)
[7]  
PANG J, LI X, XIE H, Et al., SBTM: topic modeling over short texts [C], International Conference on Database Systems for Advanced Applications, pp. 43-56, (2016)
[8]  
MEHROTRA R, SANNER S, BUNTINE W, Et al., Improving LDA topic models for microblogs via tweet pooling and automatic labeling [C], Proceedings of the 36th International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 889-892, (2013)
[9]  
LI X, WANG Y, ZHANG A, Et al., Filtering out the noise in short text topic modeling [J], Information Sciences, 456, pp. 83-96, (2018)
[10]  
LI C, WANG H, ZHANG Z, Et al., Topic modeling for short texts with auxiliary word embeddings [C], Proceedings of the 39th International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 165-174, (2016)