Empirical comparison of text-based mobile apps similarity measurement techniques

被引:25
作者
Al-Subaihin, Afnan [1 ,2 ]
Sarro, Federica [1 ]
Black, Sue [3 ]
Capra, Licia [1 ]
机构
[1] UCL, Dept Comp Sci, London, England
[2] King Saud Univ, CCIS, Riyadh, Saudi Arabia
[3] Univ Durham, Dept Comp Sci, Durham, England
关键词
App store analysis; Software clustering; Mobile applications Clustering; Feature extraction; Cluster analysis; COEFFICIENT;
D O I
10.1007/s10664-019-09726-5
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
Context Code-free software similarity detection techniques have been used to support different software engineering tasks, including clustering mobile applications (apps). The way of measuring similarity may affect both the efficiency and quality of clustering solutions. However, there has been no previous comparative study of feature extraction methods used to guide mobile app clustering. Objective In this paper, we investigate different techniques to compute the similarity of apps based on their textual descriptions and evaluate their effectiveness using hierarchical agglomerative clustering. Method To this end we carry out an empirical study comparing five different techniques, based on topic modelling and keyword feature extraction, to cluster 12,664 apps randomly sampled from the Google Play App Store. The comparison is based on three main criteria: silhouette width measure, human judgement and execution time. Results The results of our study show that using topic modelling, in addition to collocation-based and dependency-based feature extractors perform similarly in detecting app-feature similarity. However, dependency-based feature extraction performs better than any other in finding application domain similarity (rho = 0.7,p - value < 0.01). Conclusions Current categorisation in the app store studied does not exhibit a good classification quality in terms of the claimed feature space. However, a better quality can be achieved using a good feature extraction technique and a traditional clustering method.
引用
收藏
页码:3290 / 3315
页数:26
相关论文
共 81 条
[1]   Clustering Mobile Apps Based on Mined Textual Features [J].
Al-Subaihin, A. A. ;
Sarro, F. ;
Black, S. ;
Capra, L. ;
Harman, M. ;
Jia, Y. ;
Zhang, Y. .
ESEM'16: PROCEEDINGS OF THE 10TH ACM/IEEE INTERNATIONAL SYMPOSIUM ON EMPIRICAL SOFTWARE ENGINEERING AND MEASUREMENT, 2016,
[2]  
Alsubaie A., 2015, P 3 INT WORKSH SOFTW, P1, DOI DOI 10.1145/2804345.2804346
[3]  
AlSubaihin A., 2019, IEEE Transactions on Software Engineering, P1
[4]  
[Anonymous], ON DEMAND FEATURE RE
[5]  
Arnaoudova V, 2015, USE TEXT RETRIEVAL N
[6]   Detecting Behavior Anomalies in Graphical User Interfaces [J].
Avdiienko, Vitalii ;
Kuznetsov, Konstantin ;
Rommelfanger, Isabelle ;
Rau, Andreas ;
Gorla, Alessandra ;
Zeller, Andreas .
PROCEEDINGS OF THE 2017 IEEE/ACM 39TH INTERNATIONAL CONFERENCE ON SOFTWARE ENGINEERING COMPANION (ICSE-C 2017), 2017, :201-203
[7]  
Babbie E.R., 1998, The practice of social research, V112
[8]   Which Feature is Unusable? Detecting Usability and User Experience Issues from User Reviews [J].
Bakiu, Elsa ;
Guzman, Emitza .
2017 IEEE 25TH INTERNATIONAL REQUIREMENTS ENGINEERING CONFERENCE WORKSHOPS (REW), 2017, :182-187
[9]   INTRACLASS CORRELATION COEFFICIENT AS A MEASURE OF RELIABILITY [J].
BARTKO, JJ .
PSYCHOLOGICAL REPORTS, 1966, 19 (01) :3-&
[10]   Multi-Store Metadata-Based Supervised Mobile App Classification [J].
Berardi, Giacomo ;
Esuli, Andrea ;
Fagni, Tiziano ;
Sebastiani, Fabrizio .
30TH ANNUAL ACM SYMPOSIUM ON APPLIED COMPUTING, VOLS I AND II, 2015, :585-588