How Should We Measure Functional Sameness from Program Source Code? An Exploratory Study on Java']Java Methods

被引:11
作者
Higo, Yoshiki [1 ]
Kusumoto, Shinji [1 ]
机构
[1] Osaka Univ, 1-5 Yamadaoka, Suita, Osaka, Japan
来源
22ND ACM SIGSOFT INTERNATIONAL SYMPOSIUM ON THE FOUNDATIONS OF SOFTWARE ENGINEERING (FSE 2014) | 2014年
基金
日本学术振兴会;
关键词
Functionally similar code; Clone Detection; Structural similarity; Vocabulary similarity; Method name similarity; CLONE DETECTION; REFACTORING OPPORTUNITIES; SYSTEM; CCFINDER;
D O I
10.1145/2635868.2635886
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
Program source code is one of the main targets of software engineering research. A wide variety of research has been conducted on source code, and many studies have leveraged structural, vocabulary, and method signature similarities to measure the functional sameness of source code. In this research, we conducted an empirical study to ascertain how we should use three similarities to measure functional sameness. We used two large datasets and measured the three similarities between all the method pairs in the datasets, each of which included approximately 15 million Java method pairs. The relationships between the three similarities were analyzed to determine how we should use each to detect functionally similar code. The results of our study revealed the following. (1) Method names are not always useful for detecting functionally similar code. Only if there are a small number of methods having a given name, the methods are likely to include functionally similar code. (2) Existing file-level, method-level, and block-level clone detection techniques often miss functionally similar code generated by copy-and-paste operations between different projects. (3) In the cases we use structural similarity for detecting functionally similar code, we obtained many false positives. However, we can avoid detecting most false positives by using a vocabulary similarity in addition to a structural one. (4) Using a vocabulary similarity to detect functionally similar code is not suitable for method pairs in the same file because such method pairs use many of the same program elements such as private methods or private fields.
引用
收藏
页码:294 / 305
页数:12
相关论文
共 46 条
[1]  
Abebe S. L., 2012, 2012 19th Working Conference on Reverse Engineering (WCRE), P235, DOI 10.1109/WCRE.2012.33
[2]  
[Anonymous], 2010, IEEE INT C SOFTWAREM
[3]  
Bajracharya Sushil K., 2010, P 18 ACM SIGSOFT INT, DOI DOI 10.1145/1882291.1882316
[4]   Comparison and evaluation of clone detection tools [J].
Bellon, Stefan ;
Koschke, Rainer ;
Antoniol, Giuliano ;
Krinke, Jens ;
Merlo, Ettore .
IEEE TRANSACTIONS ON SOFTWARE ENGINEERING, 2007, 33 (09) :577-591
[5]  
Biggers L. R., 2011, 2011 IEEE 27th International Conference on Software Maintenance, P492, DOI 10.1109/ICSM.2011.6080816
[6]   A METRICS SUITE FOR OBJECT-ORIENTED DESIGN [J].
CHIDAMBER, SR ;
KEMERER, CF .
IEEE TRANSACTIONS ON SOFTWARE ENGINEERING, 1994, 20 (06) :476-493
[7]   Investigating the use of Lexical Information for Software System Clustering [J].
Corazza, Anna ;
Di Martino, Sergio ;
Maggio, Valerio ;
Scanniello, Giuseppe .
2011 15TH EUROPEAN CONFERENCE ON SOFTWARE MAINTENANCE AND REENGINEERING (CSMR), 2011, :35-44
[8]  
Fowler M., 2018, Refactoring: Improving the Design of Existing Code
[9]   THE VOCABULARY PROBLEM IN HUMAN SYSTEM COMMUNICATION [J].
FURNAS, GW ;
LANDAUER, TK ;
GOMEZ, LM ;
DUMAIS, ST .
COMMUNICATIONS OF THE ACM, 1987, 30 (11) :964-971
[10]   A comparison of abstract data types and objects recovery techniques [J].
Girard, JF ;
Koschke, R .
SCIENCE OF COMPUTER PROGRAMMING, 2000, 36 (2-3) :149-181