A Mixed Fuzzy Similarity Approach to Detect Plagiarism in Persian Texts

被引：0

作者：

Ahangarbahan, Hamid ^{[1
]}

Montazer, Gholam Ali ^{[1
]}

机构：

[1] Tarbiat Modares Univ, Sch Engn, Tehran, Iran

来源：

ADVANCES IN COMPUTATIONAL INTELLIGENCE, PT I (IWANN 2015) | 2015年 / 9094卷

关键词：

Plagiarism; Similarity metric; Fuzzy sets; Semantic similarity; Lexical similarity;

D O I：

10.1007/978-3-319-19258-1_43

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

A variety of methods and metrics have been offered so far to measure the extent of similarity among various documents and plagiarism detection systems. However, most of them do not take ambiguity inherent in natural language into account. Therefore, in this paper, a new method taking lexical and semantic features and similarity measures into consideration has been proposed. In the first step, after preprocessing and removing stop word, a text was divided into two parts: general and domain-specific knowledge words. Then, the mixed lexical and semantic fuzzy inference system was designed to assess text similarity. The proposed method was evaluated on Persian paper abstracts of International Conference on e-Learning and e-Teaching (ICELET) Corpus and using IT domain knowledge ontology. The results indicated that the proposed method can achieve a rate of 79% in terms of precision and can detect 83% of the plagiarism cases.

引用

页码：525 / 534

页数：10