Clone-Seeker: Effective Code Clone Search Using Annotations

被引:4
作者
Hammad, Muhammad [1 ]
Babur, Onder [1 ,2 ]
Basit, Hamid Abdul [3 ]
Van den Brand, Mark [1 ]
机构
[1] Eindhoven Univ Technol, Dept Math & Comp Sci, NL-5612 AZ Eindhoven, Netherlands
[2] Wageningen Univ & Res, Informat Technol Grp, NL-6708 PB Wageningen, Netherlands
[3] Prince Sultan Univ, Dept Software Engn, Riyadh 12435, Saudi Arabia
关键词
Codes; Cloning; Natural languages; Software; Search engines; Semantics; Writing; Annotation; code clone; code clone search; keyword extraction; information retrieval; SOFTWARE;
D O I
10.1109/ACCESS.2022.3145686
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Source code search plays an important role in software development, e.g. for exploratory development or opportunistic reuse of existing code from a code base. Often, exploration of different implementations with the same functionality is needed for tasks like automated software transplantation, software diversification, and software repair. Code clones, which are syntactically or semantically similar code fragments, are perfect candidates for such tasks. Searching for code clones involves a given search query to retrieve the relevant code fragments. We propose a novel approach called Clone-Seeker that focuses on utilizing clone class features in retrieving code clones. For this purpose, we generate metadata for each code clone in the form of a natural language document. The metadata includes a pre-processed list of identifiers from the code clones augmented with a list of keywords indicating the semantics of the code clone. This keyword list can be extracted from a manually annotated general description of the clone class, or automatically generated from the source code of the entire clone class. This approach helps developers to perform code clone search based on a search query written either as source code terms, or as natural language. With various experiments, we show that (1) Clone-Seeker is effective in finding clones from BigCloneBench dataset by applying code queries and natural language queries; 2) Clone-Seeker has a higher recall when searching for semantic code clones (i.e., Type-4) in BigCloneBench than the state-of-the-art; 3) Clone-Seeker is a generalized technique as it is effective in finding clones in Project CodeNet dataset by applying code queries and natural language queries. 4) Clone-Seeker with manual annotation outperforms other variants in finding clones on the basis of natural language queries
引用
收藏
页码:11696 / 11713
页数:18
相关论文
共 50 条
  • [1] An Effective Semantic Code Clone Detection Framework Using Pairwise Feature Fusion
    Sheneamer, Abdullah
    Roy, Swarup
    Kalita, Jugal
    IEEE ACCESS, 2021, 9 : 84828 - 84844
  • [2] Clone-Writer: An effective editor for developing code by using code clones
    Hammad, Muhammad
    Babur, Onder
    Basit, Hamid Abdul
    van den Brand, Mark
    SOFTWARE IMPACTS, 2022, 13
  • [3] CCDive: A Deep Dive into Code Clone Detection Using Local Sequence Alignment
    Glani, Yasir
    Ping, Luo
    Shah, Syed Asad
    Ke, Lin
    TSINGHUA SCIENCE AND TECHNOLOGY, 2025, 30 (04): : 1435 - 1456
  • [4] VGRAPH: A Robust Vulnerable Code Clone Detection System Using Code Property Triplets
    Bowman, Benjamin
    Huang, H. Howie
    2020 5TH IEEE EUROPEAN SYMPOSIUM ON SECURITY AND PRIVACY (EUROS&P 2020), 2020, : 53 - 69
  • [5] A Feature Analysis of Co-changed Code Clone by Using Clone Metrics
    Yudha, Myrizki Sandhi
    Asano, Ryohei
    Aman, Hirohisa
    IEICE TRANSACTIONS ON FUNDAMENTALS OF ELECTRONICS COMMUNICATIONS AND COMPUTER SCIENCES, 2012, E95A (09) : 1498 - 1500
  • [6] Prioritizing Code Clone Detection Results for Clone Management
    Venkatasubramanyam, Radhika D.
    Gupta, Shrinath
    Singh, Himanshu Kumar
    2013 7TH INTERNATIONAL WORKSHOP ON SOFTWARE CLONES (IWSC), 2013, : 30 - 36
  • [7] Siamese: scalable and incremental code clone search via multiple code representations
    Ragkhitwetsagul, Chaiyong
    Krinke, Jens
    EMPIRICAL SOFTWARE ENGINEERING, 2019, 24 (04) : 2236 - 2284
  • [8] Siamese: scalable and incremental code clone search via multiple code representations
    Chaiyong Ragkhitwetsagul
    Jens Krinke
    Empirical Software Engineering, 2019, 24 : 2236 - 2284
  • [9] Refactoring Code Clone Detection
    Othman, Zhala Sarkawt
    Kaya, Mehmet
    2019 7TH INTERNATIONAL SYMPOSIUM ON DIGITAL FORENSICS AND SECURITY (ISDFS), 2019,
  • [10] A Comparative Study of Code Clone Genealogies in Test Code and Production Code
    Van Bladel, Brent
    Demeyer, Serge
    2023 IEEE INTERNATIONAL CONFERENCE ON SOFTWARE ANALYSIS, EVOLUTION AND REENGINEERING, SANER, 2023, : 913 - 920