Redundancy-free analysis of multi-revision software artifacts

被引:15
作者
Alexandru, Carol V. [1 ]
Panichella, Sebastiano [1 ]
Proksch, Sebastian [1 ]
Gall, Harald C. [1 ]
机构
[1] Software Evolut & Architecture Lab Seal, Binzmuhlestr 14, CH-8050 Zurich, Switzerland
基金
瑞士国家科学基金会;
关键词
Software analysis; Software evolution; Graph database; Asynchronous computation; Static code analysis; Large-scale; Multi-language; Language-independent; QUALITY; HISTORY; TOOL;
D O I
10.1007/s10664-018-9630-9
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
Researchers often analyze several revisions of a software project to obtain historical data about its evolution. For example, they statically analyze the source code and monitor the evolution of certain metrics over multiple revisions. The time and resource requirements for running these analyses often make it necessary to limit the number of analyzed revisions, e.g., by only selecting major revisions or by using a coarse-grained sampling strategy, which could remove significant details of the evolution. Most existing analysis techniques are not designed for the analysis of multi-revision artifacts and they treat each revision individually. However, the actual difference between two subsequent revisions is typically very small. Thus, tools tailored for the analysis of multiple revisions should only analyze these differences, thereby preventing re-computation and storage of redundant data, improving scalability and enabling the study of a larger number of revisions. In this work, we propose the Lean Language-Independent Software Analyzer (LISA), a generic framework for representing and analyzing multi-revisioned software artifacts. It employs a redundancy-free, multi-revision representation for artifacts and avoids re-computation by only analyzing changed artifact fragments across thousands of revisions. The evaluation of our approach consists of measuring the effect of each individual technique incorporated, an in-depth study of LISA resource requirements and a large-scale analysis over 7 million program revisions of 4,000 software projects written in four languages. We show that the time and space requirements for multi-revision analyses can be reduced by multiple orders of magnitude, when compared to traditional, sequential approaches.
引用
收藏
页码:332 / 380
页数:49
相关论文
共 86 条
[1]   Rapid Multi-Purpose, Multi-Commit Code Analysis [J].
Alexandru, Carol V. ;
Gall, Harald C. .
2015 IEEE/ACM 37th IEEE International Conference on Software Engineering, Vol 2, 2015, :635-638
[2]  
Alexandru CV, 2017, IEEE 24 INT C SOFTW
[3]  
Allamanis M., 2013, P 10 WORK C MIN SOFT
[4]  
[Anonymous], 2006, P 2006 INT WORKSH MI, DOI DOI 10.1145/1137983.1137999
[5]  
[Anonymous], 2008, PROC 16 ACM SIGSOFT
[6]  
[Anonymous], 2015, The promise repository of empirical software engineering data
[7]  
Arbuckle T., 2011, Proceedings of the 12th International Workshop on Principles of Software Evolution and the 7th annual ERCIM Workshop on Software Evolution, P91
[8]  
Bavota G, 2014, EMPIR SOFTW ENG, V20, P1
[9]   The Evolution of Project Inter-Dependencies in a Software Ecosystem: the Case of Apache [J].
Bavota, Gabriele ;
Canfora, Gerardo ;
Di Penta, Massimiliano ;
Oliveto, Rocco ;
Panichella, Sebastiano .
2013 29TH IEEE INTERNATIONAL CONFERENCE ON SOFTWARE MAINTENANCE (ICSM), 2013, :280-289
[10]  
Bavota G, 2012, 2012 28TH IEEE INTERNATIONAL CONFERENCE ON SOFTWARE MAINTENANCE (ICSM), P56, DOI 10.1109/ICSM.2012.6405253