Evaluation of web service clustering using Dirichlet Multinomial Mixture model based approach for Dimensionality Reduction in service representation

被引:25
作者
Agarwal, Neha [1 ]
Sikka, Geeta [1 ]
Awasthi, Lalit Kumar [2 ]
机构
[1] Dr BR Ambedkar Natl Inst Technol, Dept Comp Sci & Engn, Jalandhar 144011, Punjab, India
[2] Dr BR Ambedkar Natl Inst Technol, Jalandhar 144011, Punjab, India
关键词
Web service clustering; Dirichlet Multinomial Mixture (DMM) model; Latent Dirichlet Allocation (LDA); Topic modeling techniques; Clustering techniques; TOPIC MODEL; LDA;
D O I
10.1016/j.ipm.2020.102238
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
In recent years, mainly the functionality of services are described in a short natural text language. Keyword-based searching for web service discovery is not efficient for providing relevant results. When services are clustered according to the similarity, then it reduces search space and due to that search time is also reduced in the web service discovery process. So in the domain of web service clustering, basically topic modeling techniques like Latent Dirichlet Allocation (LDA), Correlated Topic Model (CTM), Hierarchical Dirichlet Processing (HDP), etc. are adopted for dimensionality reduction and feature representation of services in vector space. But as the services are described in the form of short text, so these techniques are not efficient due to lack of occurring words, limited content, etc. In this paper, the performance of web service clustering is evaluated by applying various topic modeling techniques with different clustering algorithms on the crawled dataset from ProgrammableWeb repository. Gibbs Sampling algorithm for Dirichlet Multinomial Mixture (GSDMM) model is proposed as a dimensionality reduction and feature representation of services to overcome the limitations of short text clustering. Results show that GSDMM with K-Means or Agglomerative clustering is outperforming all other methods. The performance of clustering is evaluated based on three extrinsic and two intrinsic evaluation criteria. Dimensionality reduction achieved by GSDMM is 90.88%, 88.84%, and 93.13% on three real-time crawled datasets, which is satisfactory as the performance of clustering is also enhanced by deploying this technique.
引用
收藏
页数:22
相关论文
共 34 条
[1]  
Abolhassani N, 2019, L N INST COMP SCI SO, V292, P101, DOI 10.1007/978-3-030-30146-0_8
[2]  
[Anonymous], 2005, P 18 INT C ADV NEUR
[3]  
[Anonymous], 2019, ARXIV190407695
[4]  
Aznag M., 2013, European Conference on Service-Oriented and Cloud Computing, P19
[5]  
Barnaghi P., 2010, CEUR WORKSH P P 4 IN, V667
[6]  
Bhardwaj KC, 2015, J WEB ENG, V14, P196
[7]  
Blei D., 2006, Advances in neural information processing systems, V18, P147
[8]   Probabilistic Topic Models [J].
Blei, David ;
Carin, Lawrence ;
Dunson, David .
IEEE SIGNAL PROCESSING MAGAZINE, 2010, 27 (06) :55-65
[9]   Latent Dirichlet allocation [J].
Blei, DM ;
Ng, AY ;
Jordan, MI .
JOURNAL OF MACHINE LEARNING RESEARCH, 2003, 3 (4-5) :993-1022
[10]   A Web service search engine for large-scale Web service discovery based on the probabilistic topic modeling and clustering [J].
Bukhari, Afnan ;
Liu, Xumin .
SERVICE ORIENTED COMPUTING AND APPLICATIONS, 2018, 12 (02) :169-182