Information-theoretic software clustering

被引:152
作者
Andritsos, P
Tzerpos, V
机构
[1] Univ Toronto, Dept Comp Sci, Toronto, ON M5S 2E4, Canada
[2] York Univ, Dept Comp Sci & Engn, N York, ON M3J 1P3, Canada
关键词
reverse engineering; reengineering; architecture reconstruction; clustering; information theory;
D O I
10.1109/TSE.2005.25
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
The majority of the algorithms in the software clustering literature utilize structural information to decompose large software systems. Approaches using other attributes, such as file names or ownership information, have also demonstrated merit. At the same time, existing algorithms commonly deem all attributes of the software artifacts being clustered as equally important, a rather simplistic assumption. Moreover, no method that can assess the usefulness of a particular attribute for clustering purposes has been presented in the literature. In this paper, we present an approach that applies information theoretic techniques in the context of software clustering. Our approach allows for weighting schemes that reflect the importance of various attributes to be applied. We introduce LIMBO, a scalable hierarchical clustering algorithm based on the minimization of information loss when clustering a software system. We also present a method that can assess the usefulness of any nonstructural attribute in a software clustering context. We applied LIMBO to three large software systems in a number of experiments. The results indicate that this approach produces clusterings that come close to decompositions prepared by system experts. Experimental results were also used to validate our usefulness assessment method. Finally, we experimented with well-established weighting schemes from information retrieval, web search, and data clustering. We report results as to which weighting schemes show merit in the decomposition of software systems.
引用
收藏
页码:150 / 165
页数:16
相关论文
共 38 条
[1]   Reverse engineering meets data analysis [J].
Andritsos, P ;
Miller, RJ .
9TH INTERNATIONAL WORKSHOP ON PROGRAM COMPREHENSION, PROCEEDINGS, 2001, :157-166
[2]  
ANDRITSOS P, 2004, P 9 INT C EXT DATABA
[3]  
[Anonymous], THESIS U STUTTGART
[4]  
[Anonymous], 1999, 6 WORKING C REVERSE
[5]  
[Anonymous], P 13 IR C ART INT CO
[6]  
Anquetil N., 2003, IEE Proceedings-Software, V150, P185, DOI 10.1049/ip-sen:20030581
[7]  
Anquetil N, 1999, J SOFTW MAINT-RES PR, V11, P201, DOI 10.1002/(SICI)1096-908X(199905/06)11:3<201::AID-SMR192>3.0.CO
[8]  
2-1
[9]  
ANQUETIL N, 1999, P 6 WORK C REV ENG W, P235
[10]  
ANQUETIL N, 1997, P CASCON 1997, P184