What Makes a Good Biography? Multidimensional Quality Analysis Based on Wikipedia Article Feedback Data

被引:15
作者
Flekova, Lucie [1 ,2 ]
Ferschke, Oliver [1 ,2 ]
Gurevych, Iryna [1 ,2 ]
机构
[1] German Inst Educ Res & Educ Informat, Ubiquitous Knowledge Proc Lab UKP DIPF, Frankfurt, Germany
[2] Tech Univ Darmstadt, Dept Comp Sci, Ubiquitous Knowledge Proc Lab UKP TUDA, Darmstadt, Germany
来源
WWW'14: PROCEEDINGS OF THE 23RD INTERNATIONAL CONFERENCE ON WORLD WIDE WEB | 2014年
关键词
D O I
10.1145/2566486.2567972
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
With more than 22 million articles, the largest collaborative knowledge resource never sleeps, experiencing several article edits every second. Over one fifth of these articles describes individual people, the majority of which are still alive. Such articles are, by their nature, prone to corruption and vandalism. Manual quality assurance by experts can barely cope with this massive amount of data. Can it be effectively replaced by feedback from the crowd? Can we provide meaningful support for quality assurance with automated text processing techniques? Which properties of the articles should then play a key role in the machine learning algorithms and why? In this paper, we study the user-perceived quality of Wikipedia articles based on a novel Wikipedia user feedback dataset. In contrast to previous work on quality assessment which mostly relied on judgements of active Wikipedia authors, we analyze ratings of ordinary Wikipedia users along four quality dimensions (complete, well written, trustworthy and objective). We first present an empirical analysis of the novel dataset with over 36 million Wikipedia article ratings. We then select a subset of biographical articles and perform classification experiments to predict their quality ratings along each of the dimensions, exploring multiple linguistic, surface and network properties of the rated articles. Additionally, we study the classification performance and differences for the biographies of living and dead people as well as those for men and women. We demonstrate the effectiveness of our approach by the F-1 scores of 0.94, 0.89, 0.73, and 0.73 for the dimensions complete, well written, trustworthy, and objective. Based on the results, we believe that the quality assessment of big textual data can be effectively supported by current text classification and language processing tools.
引用
收藏
页码:855 / 865
页数:11
相关论文
共 44 条
  • [1] Anderka M, 2012, SIGIR 2012: PROCEEDINGS OF THE 35TH INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL, P981, DOI 10.1145/2348283.2348413
  • [2] [Anonymous], 1975, TECHNICAL REPORT
  • [3] [Anonymous], 2005, P 43 ANN M ASS COMP, DOI DOI 10.3115/1219840.1219885
  • [4] [Anonymous], QUALITY CONTROL PROC
  • [5] [Anonymous], 2011, Text Processing with GATE (Version 6)
  • [6] [Anonymous], P C REC ADV NAT LANG
  • [7] [Anonymous], CLEF 2012 LABS WORKS
  • [8] [Anonymous], 2009, P 12 C EUR CHAPT ASS
  • [9] [Anonymous], 1968, PEDAGOGISKT UTVECKLI
  • [10] [Anonymous], 2007, First Monday, DOI [DOI 10.5210/FM.V12I4.1763, 10.5210/fm.v12i4.1763]