Fuzzy Semantic-Based String Similarity Experiments to Detect Plagiarism in Indonesian Documents

被引:0
|
作者
Umareta, Chonan Firda Odayakana [1 ]
Mariyah, Siti [1 ]
机构
[1] STIS Polytech Stat, Jakarta, Indonesia
来源
2019 3RD INTERNATIONAL CONFERENCE ON INFORMATICS AND COMPUTATIONAL SCIENCES (ICICOS 2019) | 2019年
关键词
plagiarism; fuzzy; Jaccard; similarity;
D O I
10.1109/icicos48119.2019.8982501
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Plagiarism is a topic of concern in the world of education. One way to overcome plagiarism is to make comparisons between documents. Due to a large number of documents, extrinsic plagiarism detection frameworks are needed to make comparisons of documents in large numbers. On the other hand, there is intelligent plagiarism in which plagiarists try to hide their actions by one of them is replacing words with semantics. Therefore, this study applies an extrinsic plagiarism detection system with a Fuzzy Semantic-Based String Similarity method which is divided into three stages, namely Preprocessing, Heuristic Retrieval (HR), and Detailed Analysis (DA). In the preprocessing stage, the removal of irrelevant characters, the division of text based on sentences, stemming, tokenization, and the elimination of stopwords were performed. The search for pairs of candidate documents in the HR stage used fingerprints and Jaccard similarity. DA stage applied fuzzy semantic based-similarity. Experiments were carried out by comparing the level of document similarity between Jaccard similarity in the HR stage and fuzzy semantic-based similarity in the DA stage because both were able to produce a level of document similarity. The results show that fuzzy semantic-based similarity is better than Jaccard similarity because it can detect semantic similarities in the form of synonyms.
引用
收藏
页数:6
相关论文
共 50 条
  • [1] Uncovering highly obfuscated plagiarism cases using fuzzy semantic-based similarity model
    Alzahrani, Salha M.
    Salim, Naomie
    Palade, Vasile
    JOURNAL OF KING SAUD UNIVERSITY-COMPUTER AND INFORMATION SCIENCES, 2015, 27 (03) : 248 - 268
  • [2] Semantic-Based Integrated Plagiarism Detection Approach for English Documents
    Kaur, Manpreet
    Gupta, Vishal
    Kaur, Ravreet
    IETE JOURNAL OF RESEARCH, 2023, 69 (09) : 6120 - 6136
  • [3] Experiments on the Indonesian Plagiarism Detection using Latent Semantic Analysis
    Soleman, Sidik
    Purwarianti, Ayu
    2014 2ND INTERNATIONAL CONFERENCE ON INFORMATION AND COMMUNICATION TECHNOLOGY (ICOICT), 2014,
  • [4] Dynamic semantic-based adaptation of multimedia documents
    Alti, Adel
    Laborie, Sebastien
    Phillipe, Roose
    TRANSACTIONS ON EMERGING TELECOMMUNICATIONS TECHNOLOGIES, 2014, 25 (02): : 239 - 258
  • [5] A Mixed Fuzzy Similarity Approach to Detect Plagiarism in Persian Texts
    Ahangarbahan, Hamid
    Montazer, Gholam Ali
    ADVANCES IN COMPUTATIONAL INTELLIGENCE, PT I (IWANN 2015), 2015, 9094 : 525 - 534
  • [6] A Semantic-Based Approach for the Management of Digital Documents
    Pardo, Durley Torres
    Giraldo, Juan D.
    Guzman, Jaime A.
    CSE 2008: PROCEEDINGS OF THE 11TH INTERNATIONAL CONFERENCE ON COMPUTATIONAL SCIENCE AND ENGINEERING, 2008, : 251 - 256
  • [7] Semantic-based similarity computation for XML document
    Song, In-sang
    Paik, Ju-ryun
    Kim, Ung-mo
    MUE: 2007 INTERNATIONAL CONFERENCE ON MULTIMEDIA AND UBIQUITOUS ENGINEERING, PROCEEDINGS, 2007, : 796 - +
  • [8] Experiments in CLIR Using Fuzzy String Search Based on Surface Similarity
    Sethuramalingam, S.
    Singh, Anil Kumar
    Dasigi, Pradeep
    Varma, Vasudeva
    PROCEEDINGS 32ND ANNUAL INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL, 2009, : 682 - 683
  • [9] A Semantic-based Similarity of Human Drug Target Proteins
    dos Santos, Eduardo C.
    dos Santos, Marcos A.
    Couto, Braulio R. G. M.
    Lopes, Julio C. D.
    BIOINFORMATICS 2013: PROCEEDINGS OF THE INTERNATIONAL CONFERENCE ON BIOINFORMATICS MODELS, METHODS AND ALGORITHMS, 2013, : 300 - 303
  • [10] Learning Relations using Semantic-based Vector Similarity
    Budai, Kinga
    Barbantan, Ioana
    Dinsoreanu, Mihaela
    Potolea, Rodica
    2016 IEEE 12TH INTERNATIONAL CONFERENCE ON INTELLIGENT COMPUTER COMMUNICATION AND PROCESSING (ICCP), 2016, : 69 - 75