Clone-Seeker: Effective Code Clone Search Using Annotations

被引:4
作者
Hammad, Muhammad [1 ]
Babur, Onder [1 ,2 ]
Basit, Hamid Abdul [3 ]
Van den Brand, Mark [1 ]
机构
[1] Eindhoven Univ Technol, Dept Math & Comp Sci, NL-5612 AZ Eindhoven, Netherlands
[2] Wageningen Univ & Res, Informat Technol Grp, NL-6708 PB Wageningen, Netherlands
[3] Prince Sultan Univ, Dept Software Engn, Riyadh 12435, Saudi Arabia
关键词
Codes; Cloning; Natural languages; Software; Search engines; Semantics; Writing; Annotation; code clone; code clone search; keyword extraction; information retrieval; SOFTWARE;
D O I
10.1109/ACCESS.2022.3145686
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Source code search plays an important role in software development, e.g. for exploratory development or opportunistic reuse of existing code from a code base. Often, exploration of different implementations with the same functionality is needed for tasks like automated software transplantation, software diversification, and software repair. Code clones, which are syntactically or semantically similar code fragments, are perfect candidates for such tasks. Searching for code clones involves a given search query to retrieve the relevant code fragments. We propose a novel approach called Clone-Seeker that focuses on utilizing clone class features in retrieving code clones. For this purpose, we generate metadata for each code clone in the form of a natural language document. The metadata includes a pre-processed list of identifiers from the code clones augmented with a list of keywords indicating the semantics of the code clone. This keyword list can be extracted from a manually annotated general description of the clone class, or automatically generated from the source code of the entire clone class. This approach helps developers to perform code clone search based on a search query written either as source code terms, or as natural language. With various experiments, we show that (1) Clone-Seeker is effective in finding clones from BigCloneBench dataset by applying code queries and natural language queries; 2) Clone-Seeker has a higher recall when searching for semantic code clones (i.e., Type-4) in BigCloneBench than the state-of-the-art; 3) Clone-Seeker is a generalized technique as it is effective in finding clones in Project CodeNet dataset by applying code queries and natural language queries. 4) Clone-Seeker with manual annotation outperforms other variants in finding clones on the basis of natural language queries
引用
收藏
页码:11696 / 11713
页数:18
相关论文
共 50 条
  • [21] VUDDY: A Scalable Approach for Vulnerable Code Clone Discovery
    Kim, Seulbae
    Woo, Seunghoon
    Lee, Heejo
    Oh, Hakjoo
    2017 IEEE SYMPOSIUM ON SECURITY AND PRIVACY (SP), 2017, : 595 - 614
  • [22] To Enhance the Code Clone Detection Algorithm by using Hybrid Approach for detection of code clones
    Roopam
    Singh, Gurpreet
    2017 INTERNATIONAL CONFERENCE ON INTELLIGENT COMPUTING AND CONTROL SYSTEMS (ICICCS), 2017, : 192 - 198
  • [23] Semantic Code Clone Detection Method for Distributed Enterprise Systems
    Svacina, Jan
    Bushong, Vincent
    Das, Dipta
    Cerny, Tomas
    PROCEEDINGS OF THE 12TH INTERNATIONAL CONFERENCE ON CLOUD COMPUTING AND SERVICES SCIENCE (CLOSER), 2022, : 27 - 37
  • [24] On the Comprehension of Code Clone Visualizations: A Controlled Study using Eye Tracking
    Uddin, Md Sami
    Gaur, Varun
    Gutwin, Carl
    Roy, Chanchal K.
    2015 IEEE 15TH INTERNATIONAL WORKING CONFERENCE ON SOURCE CODE ANALYSIS AND MANIPULATION (SCAM), 2015, : 161 - 170
  • [25] Comparison and Evaluation of Clone Detection Techniques with Different Code Representations
    Wang, Yuekun
    Ye, Yuhang
    Wu, Yueming
    Zhang, Weiwei
    Xue, Yinxing
    Liu, Yang
    2023 IEEE/ACM 45TH INTERNATIONAL CONFERENCE ON SOFTWARE ENGINEERING, ICSE, 2023, : 332 - 344
  • [26] Improving Cross-Language Code Clone Detection via Code Representation Learning and Graph Neural Networks
    Mehrotra, Nikita
    Sharma, Akash
    Jindal, Anmol
    Purandare, Rahul
    IEEE TRANSACTIONS ON SOFTWARE ENGINEERING, 2023, 49 (11) : 4846 - 4868
  • [27] Gapped Code Clone Detection with Lightweight Source Code Analysis
    Murakami, Hiroaki
    Hotta, Keisuke
    Higo, Yoshiki
    Igaki, Hiroshi
    Kusumoto, Shinji
    2013 IEEE 21ST INTERNATIONAL CONFERENCE ON PROGRAM COMPREHENSION (ICPC), 2013, : 93 - 102
  • [28] A Survey on Code Clone, Its Behavior and Applications
    Gupta, Aakanshi
    Suri, Bharti
    NETWORKING COMMUNICATION AND DATA KNOWLEDGE ENGINEERING, VOL 2, 2018, 4 : 27 - 39
  • [29] CloneTracker: Tool Support for Code Clone Management
    Duala-Ekoko, Ekwa
    Robillard, Martin P.
    ICSE'08 PROCEEDINGS OF THE THIRTIETH INTERNATIONAL CONFERENCE ON SOFTWARE ENGINEERING, 2008, : 843 - 846
  • [30] Comparison and Visualization of Code Clone Detection Results
    Matsushima, Kazuki
    Inoue, Katsuro
    PROCEEDINGS OF THE 2020 IEEE 14TH INTERNATIONAL WORKSHOP ON SOFTWARE CLONES (IWSC '20), 2020, : 45 - 51