Cross-Language Source Code Re-Use Detection Using Latent Semantic Analysis

被引:1
|
作者
Flores, Enrique [1 ]
Barron-Cedeno, Alberto [2 ]
Moreno, Lidia [1 ]
Rosso, Paolo [1 ]
机构
[1] Univ Politecn Valencia, E-46022 Valencia, Spain
[2] HBKU, Qatar Comp Res Inst, Doha, Qatar
关键词
Cross-Language Re-Use Detection; Source Code; Plagiarism; Latent Semantic Analysis;
D O I
暂无
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
Nowadays, Internet is the main source to get information from blogs, encyclopedias, discussion forums, source code repositories, and more resources which are available just one click away. The temptation to re-use these materials is very high. Even source codes are easily available through a simple search on the Web. There is a need of detecting potential instances of source code re-use. Source code re-use detection has usually been approached comparing source codes in their compiled version. When dealing with cross-language source code re-use, traditional approaches can deal only with the programming languages supported by the compiler. We assume that a source code is a piece of text, with its syntax and structure, so we aim at applying models for free text re-use detection to source code. In this paper we compare a Latent Semantic Analysis (LSA) approach with previously used text re-use detection models for measuring cross-language similarity in source code. The LSA-based approach shows slightly better results than the other models, being able to distinguish between re-used and related source codes with a high performance.
引用
收藏
页码:1708 / 1725
页数:18
相关论文
共 9 条
  • [1] Towards the Detection of Cross-Language Source Code Reuse
    Flores, Enrique
    Barron-Cedeno, Alberto
    Rosso, Paolo
    Moreno, Lidia
    NATURAL LANGUAGE PROCESSING AND INFORMATION SYSTEMS, 2011, 6716 : 250 - 253
  • [2] Cross-Language Automatic Plagiarism Detector Using Latent Semantic Analysis and Self-Organizing Map
    Ratna, Anak Agung Putri
    Nabhastala, Paskalis Nandana Yestha
    Ibrahim, Ihsan
    Ekadiyanto, F. Astha
    Salman, Muhammad
    Herusaktiawan, Muhammad Yusuf Irfan
    Purnamasari, Prima Dewi
    AIVR 2018: 2018 INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND VIRTUAL REALITY, 2018, : 83 - 87
  • [3] An Approach to Source-Code Plagiarism Detection and Investigation Using Latent Semantic Analysis
    Cosma, Georgina
    Joy, Mike
    IEEE TRANSACTIONS ON COMPUTERS, 2012, 61 (03) : 379 - 394
  • [4] DeleSmell: Code smell detection based on deep learning and latent semantic analysis
    Zhang, Yang
    Ge, Chuyan
    Hong, Shuai
    Tian, Ruili
    Dong, Chunhao
    Liu, Jingjing
    KNOWLEDGE-BASED SYSTEMS, 2022, 255
  • [5] USING CONCEPTS OF TEXT BASED PLAGIARISM DETECTION IN SOURCE CODE PLAGIARISM ANALYSIS
    Duracik, Michal
    Krsak, Emil
    Hrkut, Patrik
    PLAGIARISM ACROSS EUROPE AND BEYOND 2017, 2017, : 177 - 186
  • [6] Providing a Source Code Security Analysis Model Using Semantic Web Techniques
    EkramiFard, Ala
    Kahani, Mohsen
    SECOND INTERNATIONAL CONGRESS ON TECHNOLOGY, COMMUNICATION AND KNOWLEDGE (ICTCK 2015), 2015, : 33 - 37
  • [7] Automatic Evaluation for E-Learning Using Latent Semantic Analysis : A Use Case
    Farrus, Mireia
    Costa-jussa, Marta R.
    INTERNATIONAL REVIEW OF RESEARCH IN OPEN AND DISTRIBUTED LEARNING, 2013, 14 (01): : 239 - 254
  • [8] Source Code Plagiarism Detection and Performance Analysis Using Fingerprint Based Distance Measure Method
    Narayanan, Sandhya
    Simi, S.
    PROCEEDINGS OF 2012 7TH INTERNATIONAL CONFERENCE ON COMPUTER SCIENCE & EDUCATION, VOLS I-VI, 2012, : 1065 - 1068
  • [9] Answer Categorization Method Using K-Means for Indonesian Language Automatic Short Answer Grading System Based on Latent Semantic Analysis
    Ratna, Anak Agung Putri
    Wulandari, Naiza Astri
    Kaltsum, Aaliyah
    Ibrahim, Ihsan
    Purnamasari, Prima Dewi
    2019 16TH INTERNATIONAL CONFERENCE ON QUALITY IN RESEARCH (QIR) / INTERNATIONAL SYMPOSIUM ON ELECTRICAL AND COMPUTER ENGINEERING, 2019, : 110 - 114