Semantic-Based Integrated Plagiarism Detection Approach for English Documents

被引:1
|
作者
Kaur, Manpreet [1 ]
Gupta, Vishal [1 ]
Kaur, Ravreet [1 ]
机构
[1] Panjab Univ, Univ Inst Engn & Technol, Chandigarh 160014, India
关键词
Extrinsic plagiarism detection; natural language processing; PAN-PC dataset; relation matrix; semantic similarity; wordnet; N-GRAMS; SIMILARITY; DATABASE;
D O I
10.1080/03772063.2021.2004383
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
The proposed work models a novel plagiarism detection system based on the semantic features to uncover the cases of plagiarism. The system constructs the dynamic relation matrix for each suspicious and source sentence pair to measure the degree of similarity using semantic features. Two Weighted Inverse Distance and GlossDice procedures show several text properties (synonyms, shortest path, etc.) to overcome the limitations of the existing features and new similarity metric for plagiarism detection are presented in this paper. Moreover, this research investigates the independent performance of various features to detect plagiarized cases and combine the best features by assigning different weight contributions to further enhance the system performance. Weighted Inverse Distance integrated with SynJaccard boosts the system performance and shows promising results. Initially, all the experiments were performed on PAN-PC-11dataset, and then PAN-14 text alignment dataset was used to validate the results of the proposed system. The effectiveness of the proposed system has been measured using standard performance measures i.e. Precision, Recall, F-measure, Granularity, and Plagdet score. The proposed system has outperformed the other baseline systems with precision (0.9459), recall (0.8861), f-measure (0.8917), and plagdet (0.8857) on the PAN-PC-11 dataset. For PAN-14 text alignment, the system exhibits precision (0.9257), recall (0.9055), f-measure (0.8931), and plagdet (0.8806).
引用
收藏
页码:6120 / 6136
页数:17
相关论文
共 50 条
  • [1] A Semantic-Based Approach for the Management of Digital Documents
    Pardo, Durley Torres
    Giraldo, Juan D.
    Guzman, Jaime A.
    CSE 2008: PROCEEDINGS OF THE 11TH INTERNATIONAL CONFERENCE ON COMPUTATIONAL SCIENCE AND ENGINEERING, 2008, : 251 - 256
  • [2] Fuzzy Semantic-Based String Similarity Experiments to Detect Plagiarism in Indonesian Documents
    Umareta, Chonan Firda Odayakana
    Mariyah, Siti
    2019 3RD INTERNATIONAL CONFERENCE ON INFORMATICS AND COMPUTATIONAL SCIENCES (ICICOS 2019), 2019,
  • [3] A Semantic-based Approach to Service Clustering from Service Documents
    Jiang, Bo
    Ye, Lingyao
    Wang, Jialei
    Wang, Ye
    2017 IEEE INTERNATIONAL CONFERENCE ON SERVICES COMPUTING (SCC), 2017, : 265 - 272
  • [4] An integrated semantic-based approach in concept based video retrieval
    Sara Memar
    Lilly Suriani Affendey
    Norwati Mustapha
    Shyamala C. Doraisamy
    Mohammadreza Ektefa
    Multimedia Tools and Applications, 2013, 64 : 77 - 95
  • [5] An integrated semantic-based approach in concept based video retrieval
    Memar, Sara
    Affendey, Lilly Suriani
    Mustapha, Norwati
    Doraisamy, Shyamala C.
    Ektefa, Mohammadreza
    MULTIMEDIA TOOLS AND APPLICATIONS, 2013, 64 (01) : 77 - 95
  • [6] Towards a semantic-based approach for modeling regulatory documents in building industry
    Bouzidi, K. R.
    Fies, B.
    Faron-Zucker, C.
    Le Than, N.
    Corby, O.
    EWORK AND EBUSINESS IN ARCHITECTURE, ENGINEERING AND CONSTRUCTION, 2012, : 347 - 353
  • [7] Dynamic semantic-based adaptation of multimedia documents
    Alti, Adel
    Laborie, Sebastien
    Phillipe, Roose
    TRANSACTIONS ON EMERGING TELECOMMUNICATIONS TECHNOLOGIES, 2014, 25 (02): : 239 - 258
  • [8] A Review of Plagiarism Detection Based On Lexical and Semantic Approach
    Yousuf, Shameem
    Ahmad, Muzamil
    Nasrullah, Sheikh
    2013 INTERNATIONAL CONFERENCE ON EMERGING TRENDS IN COMMUNICATION, CONTROL, SIGNAL PROCESSING AND COMPUTING APPLICATIONS (IEEE-C2SPCA-2013), 2013,
  • [9] A semantic-based classification approach for an enhanced spam detection
    Saidani, Nadjate
    Adi, Kamel
    Allili, Mohand Said
    COMPUTERS & SECURITY, 2020, 94
  • [10] Towards a Semantic-Based Approach for Affect and Metaphor Detection
    Zhang, Li
    Barnden, John
    INTERNATIONAL JOURNAL OF DISTANCE EDUCATION TECHNOLOGIES, 2013, 11 (02) : 48 - 65