Schemes for Labeling Semantic Code Clones using Machine Learning

被引:8
作者
Sheneamer, Abdullah [1 ,2 ]
Hazazi, Hanan [3 ]
Roy, Swarup [4 ]
Kalita, Jugal [2 ]
机构
[1] Jazan Univ, Fac Comp Sci & Informat Syst, Jazan 45142, Saudi Arabia
[2] Univ Colorado, Coll Engn & Appl Sci, Colorado Springs, CO 80918 USA
[3] Regis Univ, Coll Comp & Informat Sci, Denver, CO 80221 USA
[4] North Eastern Hill Univ, Dept Informat Technol, Shillong 793022, Meghalaya, India
来源
2017 16TH IEEE INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND APPLICATIONS (ICMLA) | 2017年
关键词
Machine Learning; Code Clones; Semantic Clones; AST; PDG; Features; Classification;
D O I
10.1109/ICMLA.2017.00-25
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Machine learning approaches built to identify code clones fail to perform well due to insufficient training samples and have been restricted only up to Type-III clones. A majority of the publicly available code clone corpora are incomplete in nature and lack labeled samples for semantic or Type-IV clones. We present here two schemes for labeling all types of clones including Type-IV clones. We restrict our study to Java code only. First, we use an unsupervised approach to label Type-IV clones and validate them using expert Java programmers. Next, we present a supervised scheme for labeling (or classifying) unknown samples based on labeled samples derived from our first scheme. We evaluate the performance of our schemes using six well-known Java code clone corpora and report on the quality of produced clones in terms of kappa agreement, mean error and accuracy scores. Results show that both schemes produce high quality code clones facilitating future use of machine learning in detecting clones of Type-IV.
引用
收藏
页码:981 / 985
页数:5
相关论文
共 22 条
[1]   INSTANCE-BASED LEARNING ALGORITHMS [J].
AHA, DW ;
KIBLER, D ;
ALBERT, MK .
MACHINE LEARNING, 1991, 6 (01) :37-66
[2]  
[Anonymous], P 15 IEEE INT C MACH
[3]   Clone detection using abstract syntax trees [J].
Baxter, ID ;
Yahin, A ;
Moura, L ;
Sant'Anna, M ;
Bier, L .
INTERNATIONAL CONFERENCE ON SOFTWARE MAINTENANCE, PROCEEDINGS, 1998, :368-377
[4]   Comparison and evaluation of clone detection tools [J].
Bellon, Stefan ;
Koschke, Rainer ;
Antoniol, Giuliano ;
Krinke, Jens ;
Merlo, Ettore .
IEEE TRANSACTIONS ON SOFTWARE ENGINEERING, 2007, 33 (09) :577-591
[5]   Random forests [J].
Breiman, L .
MACHINE LEARNING, 2001, 45 (01) :5-32
[6]   Random forests [J].
Breiman, L .
MACHINE LEARNING, 2001, 45 (01) :5-32
[7]   Ensemble methods in machine learning [J].
Dietterich, TG .
MULTIPLE CLASSIFIER SYSTEMS, 2000, 1857 :1-15
[8]   THE PROGRAM DEPENDENCE GRAPH AND ITS USE IN OPTIMIZATION [J].
FERRANTE, J ;
OTTENSTEIN, KJ ;
WARREN, JD .
ACM TRANSACTIONS ON PROGRAMMING LANGUAGES AND SYSTEMS, 1987, 9 (03) :319-349
[9]  
Jadon S., 2016, COMP COMM AUT ICCCA, P399
[10]  
Krutz Daniel E, 2014, P 11 WORK C MIN SOFT, P388, DOI 10.1145/2597073.2597127