Clone-Seeker: Effective Code Clone Search Using Annotations

被引:4
作者
Hammad, Muhammad [1 ]
Babur, Onder [1 ,2 ]
Basit, Hamid Abdul [3 ]
Van den Brand, Mark [1 ]
机构
[1] Eindhoven Univ Technol, Dept Math & Comp Sci, NL-5612 AZ Eindhoven, Netherlands
[2] Wageningen Univ & Res, Informat Technol Grp, NL-6708 PB Wageningen, Netherlands
[3] Prince Sultan Univ, Dept Software Engn, Riyadh 12435, Saudi Arabia
关键词
Codes; Cloning; Natural languages; Software; Search engines; Semantics; Writing; Annotation; code clone; code clone search; keyword extraction; information retrieval; SOFTWARE;
D O I
10.1109/ACCESS.2022.3145686
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Source code search plays an important role in software development, e.g. for exploratory development or opportunistic reuse of existing code from a code base. Often, exploration of different implementations with the same functionality is needed for tasks like automated software transplantation, software diversification, and software repair. Code clones, which are syntactically or semantically similar code fragments, are perfect candidates for such tasks. Searching for code clones involves a given search query to retrieve the relevant code fragments. We propose a novel approach called Clone-Seeker that focuses on utilizing clone class features in retrieving code clones. For this purpose, we generate metadata for each code clone in the form of a natural language document. The metadata includes a pre-processed list of identifiers from the code clones augmented with a list of keywords indicating the semantics of the code clone. This keyword list can be extracted from a manually annotated general description of the clone class, or automatically generated from the source code of the entire clone class. This approach helps developers to perform code clone search based on a search query written either as source code terms, or as natural language. With various experiments, we show that (1) Clone-Seeker is effective in finding clones from BigCloneBench dataset by applying code queries and natural language queries; 2) Clone-Seeker has a higher recall when searching for semantic code clones (i.e., Type-4) in BigCloneBench than the state-of-the-art; 3) Clone-Seeker is a generalized technique as it is effective in finding clones in Project CodeNet dataset by applying code queries and natural language queries. 4) Clone-Seeker with manual annotation outperforms other variants in finding clones on the basis of natural language queries
引用
收藏
页码:11696 / 11713
页数:18
相关论文
共 50 条
  • [41] LLVM-Based Code Clone Detection Framework
    Avetisyan, Arutyun
    Kurmangaleev, Shamil
    Sargsyan, Sevak
    Arutunian, Mariam
    Belevantsev, Andrey
    TENTH INTERNATIONAL CONFERENCE ON COMPUTER SCIENCE AND INFORMATION TECHNOLOGIES REVISED SELECTED PAPERS CSIT-2015, 2015, : 100 - 104
  • [42] Program Slice based Vulnerable Code Clone Detection
    Song, Xiaonan
    Yu, Aimin
    Yu, Haibo
    Liu, Shirun
    Bai, Xin
    Cai, Lijun
    Meng, Dan
    2020 IEEE 19TH INTERNATIONAL CONFERENCE ON TRUST, SECURITY AND PRIVACY IN COMPUTING AND COMMUNICATIONS (TRUSTCOM 2020), 2020, : 293 - 300
  • [43] Code Clone Detection Model: A SWOT Analysis Perspective
    Mubarak-Ali, Al-Fahim
    Romli, Rahiwan Nazar
    Sjarif, Nilam Nur Amir
    ADVANCED SCIENCE LETTERS, 2018, 24 (10) : 7210 - 7213
  • [44] Code Clone Detection with Hierarchical Attentive Graph Embedding
    Ji, Xiujuan
    Liu, Lei
    Zhu, Jingwen
    INTERNATIONAL JOURNAL OF SOFTWARE ENGINEERING AND KNOWLEDGE ENGINEERING, 2021, 31 (06) : 837 - 861
  • [45] Challenging Machine Learning-Based Clone Detectors via Semantic-Preserving Code Transformations
    Zhang, Weiwei
    Guo, Shengjian
    Zhang, Hongyu
    Sui, Yulei
    Xue, Yinxing
    Xu, Yun
    IEEE TRANSACTIONS ON SOFTWARE ENGINEERING, 2023, 49 (05) : 3052 - 3070
  • [46] Component-based Experimental Testbed to Faciltiate Code Clone Detection Research
    Wijesiriwardana, Chaman
    Wimalaratne, Prasad
    PROCEEDINGS OF 2017 8TH IEEE INTERNATIONAL CONFERENCE ON SOFTWARE ENGINEERING AND SERVICE SCIENCE (ICSESS 2017), 2017, : 165 - 168
  • [47] Measuring the Efficacy of Code Clone Information in a Bug Localization Task: An Empirical Study
    Chatterji, Debarshi
    Carver, Jeffrey C.
    Massengill, Beverly
    Oslin, Jason
    Kraft, Nicholas A.
    2011 FIFTH INTERNATIONAL SYMPOSIUM ON EMPIRICAL SOFTWARE ENGINEERING AND MEASUREMENT (ESEM 2011), 2011, : 20 - 29
  • [48] CLCDSA: Cross Language Code Clone Detection using Syntactical Features and API Documentation
    Nafi, Kawser Wazed
    Kar, Tonny Shekha
    Roy, Banani
    Roy, Chanchal K.
    Schneider, Kevin A.
    34TH IEEE/ACM INTERNATIONAL CONFERENCE ON AUTOMATED SOFTWARE ENGINEERING (ASE 2019), 2019, : 1026 - 1037
  • [49] CONCORD: Clone-Aware Contrastive Learning for Source Code
    Ding, Yangruibo
    Chakraborty, Saikat
    Buratti, Luca
    Pujar, Saurabh
    Morari, Alessandro
    Kaiser, Gail
    Ray, Baishakhi
    PROCEEDINGS OF THE 32ND ACM SIGSOFT INTERNATIONAL SYMPOSIUM ON SOFTWARE TESTING AND ANALYSIS, ISSTA 2023, 2023, : 26 - 38
  • [50] Fast Code Clone Detection Based on Weighted Recursive Autoencoders
    Zeng, Jie
    Ben, Kerong
    Li, Xiaowei
    Zhang, Xian
    IEEE ACCESS, 2019, 7 : 125062 - 125078