Development and benchmarking of multilingual code clone detector

被引:0
作者
Zhu, Wenqing [1 ]
Yoshida, Norihiro [2 ]
Kamiya, Toshihiro [3 ]
Choi, Eunjong [4 ]
Takada, Hiroaki [1 ]
机构
[1] Nagoya Univ, Grad Sch Informat, Furo Cho, Nagoya, Aichi 4648601, Japan
[2] Ritsumeikan Univ, Grad Sch Informat Sci & Engn, 2-150 Iwakura Cho, Osaka, Ibaraki 5678570, Japan
[3] Shimane Univ, Interdisciplinary Fac Sci & Engn, 1060 Nishikawatsu Cho, Matsue, Shimane 6908504, Japan
[4] Kyoto Inst Technol, Fac Informat & Human Sci, Sakyo Ku, Kyoto, Kyoto 6068585, Japan
关键词
Code clone; Parser generation; Benchmark testing; SEARCH; ANTLR;
D O I
10.1016/j.jss.2024.112215
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
The diversity of programming languages is growing, making the language extensibility of code clone detectors crucial. However, this is challenging for most existing clone detection detectors because the source code handler needs modifications, which requires specialist-level knowledge of the targeted language and is time-consuming. Multilingual code clone detectors make it easier to add new language support by providing syntax information of the target language only. To address the shortcomings of existing multilingual detectors for language scalability and detection performance, we propose a multilingual code block extraction method based on ANTLR parser generation, and implement a multilingual code clone detector (MSCCD), which supports the most significant number of languages currently available and has the ability to detect Type-3 code clones. We follow the methodology of previous studies to evaluate the detection performance of the Java language. Compared to ten state-of-the-art detectors, MSCCD performs at an average level while it also supports a significantly larger number of languages. Furthermore, we propose the first multilingual syntactic code clone evaluation benchmark based on the CodeNet database. Our results reveal that even when applying the same detection approach, performance can vary markedly depending on the language of the source code under investigation. Overall, MSCCD is the most balanced one among the evaluated tools when considering detection performance and language extensibility.
引用
收藏
页数:20
相关论文
共 50 条
  • [1] A Retrospective on Developing Code Clone Detector CCFinder and Its Impact
    Kamiya, Toshihiro
    Kusumoto, Shinji
    Inoue, Katsuro
    IEEE TRANSACTIONS ON SOFTWARE ENGINEERING, 2025, 51 (03) : 808 - 813
  • [2] Code clone analysis environment for supporting software development and maintenance
    Ueda, Yaslishi
    Kamiya, Toshihiro
    Kusumoto, Shinjii
    Inoue, Katsuro
    ELECTRONICS AND COMMUNICATIONS IN JAPAN PART III-FUNDAMENTAL ELECTRONIC SCIENCE, 2006, 89 (11): : 10 - 18
  • [3] Prioritizing Code Clone Detection Results for Clone Management
    Venkatasubramanyam, Radhika D.
    Gupta, Shrinath
    Singh, Himanshu Kumar
    2013 7TH INTERNATIONAL WORKSHOP ON SOFTWARE CLONES (IWSC), 2013, : 30 - 36
  • [4] Refactoring Code Clone Detection
    Othman, Zhala Sarkawt
    Kaya, Mehmet
    2019 7TH INTERNATIONAL SYMPOSIUM ON DIGITAL FORENSICS AND SECURITY (ISDFS), 2019,
  • [5] Clone-advisor: recommending code tokens and clone methods with deep learning and information retrieval
    Hammad, Muhammad
    Babur, Onder
    Basit, Hamid Abdul
    van den Brand, Mark
    PEERJ COMPUTER SCIENCE, 2021, 7 : 1 - 39
  • [6] Multilingual Detection of Code Clones Using ANTLR Grammar Definitions
    Semura, Yuichi
    Yoshida, Norihiro
    Choi, Eunjong
    Inoue, Katsuro
    2018 25TH ASIA-PACIFIC SOFTWARE ENGINEERING CONFERENCE (APSEC 2018), 2018, : 673 - 677
  • [7] A Systematic Review on Code Clone Detection
    Ul Ain, Qurat
    Butt, Wasi Haider
    Anwar, Muhammad Waseem
    Azam, Farooque
    Maqbool, Bilal
    IEEE ACCESS, 2019, 7 : 86121 - 86144
  • [8] Gapped Code Clone Detection with Lightweight Source Code Analysis
    Murakami, Hiroaki
    Hotta, Keisuke
    Higo, Yoshiki
    Igaki, Hiroshi
    Kusumoto, Shinji
    2013 IEEE 21ST INTERNATIONAL CONFERENCE ON PROGRAM COMPREHENSION (ICPC), 2013, : 93 - 102
  • [9] Indexing source code and clone detection
    Tronicek, Zdenek
    INFORMATION AND SOFTWARE TECHNOLOGY, 2022, 144
  • [10] Code Clone Detection: A Literature Review
    Chen Q.-Y.
    Li S.-P.
    Yan M.
    Xia X.
    Ruan Jian Xue Bao/Journal of Software, 2019, 30 (04): : 962 - 980