Controversy detection in Wikipedia using semantic dissimilarity

被引：5

作者：

Jhandir, M. Zeeshan ^{[1
]}

Tenvir, Ali ^{[1
]}

On, Byung-Won ^{[2
]}

Lee, Ingyu ^{[3
]}

Choi, Gyu Sang ^{[1
]}

机构：

[1] Yeungnam Univ, Dept Informat & Commun Engn, Gyongsan 38541, Gyeongbuk, South Korea

[2] Kunsan Natl Univ, Dept Software Convergence Engn, Gunsan Si 54150, Jeollabuk Do, South Korea

[3] Troy Univ, Sorrell Coll Business, Troy, AL 36082 USA

来源：

INFORMATION SCIENCES | 2017年 / 418卷

基金：

新加坡国家研究基金会;

关键词：

Wikipedia; Controversy; Semantic dissimilarity; Sentence similarity; Natural language processing; Edit similarity; SIMILARITY; MODELS;

D O I：

10.1016/j.ins.2017.08.037

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

The advent of search engines and wikis has made access to information easy and almost free. Wikipedia is the efficacious outcome of an enormous collaboration, and its peer review-like methods of creation, maintenance, and evolution of contents, ensure high quality and reliability. However, the "anyone-can-edit" policy of Wikipedia has created many problems such as trolling, vandalism, controversies, and doubts about the content and reliability of the information provided due to non-expert involvement. People have tried to identify and rank controversies in Wikipedia articles through various techniques that use quantitative data, ignoring the semantic significance of conflicts among authors. In this paper, we have addressed the problem of identifying controversy using natural language processing techniques for the first time. The proposed method spots the impact on existing meanings of the text due to new editing processes along with their relationship to the topic of the article. The experimental results for precision (0.901), recall (0.901), accuracy (0.908), and F-measure (0.901) demonstrate the effectiveness of the proposed method. The technique is deemed useful for automatic identification of conflicts newly introduced into existing article contents, and could prove helpful in making decisions for inclusion or exclusion of controversies under the same topic. (C) 2017 Elsevier Inc. All rights reserved.

引用

页码：581 / 600

页数：20

共 47 条

[1] A BIT-STRING LONGEST-COMMON-SUBSEQUENCE ALGORITHM [J].

ALLISON, L ;

DIX, TI .

INFORMATION PROCESSING LETTERS, 1986, 23 (06) :305-310

[2]

[Anonymous], 2015, SEMEVAL

[3]

[Anonymous], 2007, P 16 INT C WORLD WID, DOI DOI 10.1145/1242572.1242608

[4]

[Anonymous], P 8 ANN INT S WIK OP

[5]

[Anonymous], EMNLP

[6]

[Anonymous], 2011, Proceedings of the 7th International Symposium on Wikis and Open Collaboration, DOI [DOI 10.1145/2038558.2038585, 10.1145/2038558.2038585]

[7]

[Anonymous], 2011, WWW 11, DOI [DOI 10.1145/1963192.1963196, 10.1145/1963192.1963196]

[8]

[Anonymous], 2007, WORLD WIDE WEB

[9]

[Anonymous], 2015, P NIPS

[10]

Brandes U., 2009, P 18 INT C WORLD WID, P731, DOI DOI 10.1145/1526709.1526808

← 1 2 3 4 5 →