History-based Article Quality Assessment on Wikipedia

被引:51
|
作者
Zhang, Shiyue [1 ]
Hu, Zheng [1 ]
Zhang, Chunhong [1 ]
Yu, Ke [1 ]
机构
[1] Beijing Univ Posts & Telecommun, State Key Lab Networking & Switching Technol, Beijing 100876, Peoples R China
来源
2018 IEEE INTERNATIONAL CONFERENCE ON BIG DATA AND SMART COMPUTING (BIGCOMP) | 2018年
关键词
Wikipedia; Information Quality; LSTM;
D O I
10.1109/BigComp.2018.00010
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
Wikipedia is widely considered as the biggest encyclopedia on Internet. Quality assessment of articles on Wikipedia has been studied for years. Conventional methods addressed this task by feature engineering and statistical machine learning algorithms. However, manually defined features are difficult to represent the long edit history of an article. Recently, researchers proposed an end-to-end neural model which used a Recurrent Neural Network(RNN) to learn the representation automatically. Although RNN showed its power in modeling edit history, the end-to-end method is time and resource consuming. In this paper, we propose a new history-based method to represent an article. We also take advantage of an RNN to handle the long edit history, but we do not abandon feature engineering. We still represent each revision of an article by manually defined features. This combination of deep neural model and feature engineering enables our model to be both simple and effective. Experiments demonstrate our model has better or comparable performance than previous works, and has the potential to work as a real-time service. Plus, we extend our model to do quality prediction.
引用
收藏
页码:1 / 8
页数:8
相关论文
共 50 条
  • [31] Neural Article Pair Modeling for Wikipedia Sub-article Matching
    Chen, Muhao
    Meng, Changping
    Huang, Gang
    Zaniolo, Carlo
    MACHINE LEARNING AND KNOWLEDGE DISCOVERY IN DATABASES, ECML PKDD 2018, PT III, 2019, 11053 : 3 - 19
  • [32] On Quality Assesement in Wikipedia Articles Based on Markov Random Fields
    Kleminski, Rajmund
    Kajdanowicz, Tomasz
    Bartusiak, Roman
    Kazienko, Przemyslaw
    INTELLIGENT INFORMATION AND DATABASE SYSTEMS, ACIIDS 2017, PT I, 2017, 10191 : 782 - 791
  • [33] Determining Quality of Articles in Polish Wikipedia Based on Linguistic Features
    Lewoniewski, Wlodzimierz
    Wecel, Krzysztof
    Abramowicz, Witold
    INFORMATION AND SOFTWARE TECHNOLOGIES, ICIST 2018, 2018, 920 : 546 - 558
  • [34] Relative Quality Assessment of Wikipedia Articles in Different Languages Using Synthetic Measure
    Lewoniewski, Wlodzimierz
    Wecel, Krzysztof
    BUSINESS INFORMATION SYSTEMS WORKSHOPS, BIS 2017, 2017, 303 : 282 - 292
  • [35] Information quality assessment of community-generated content - A user study of Wikipedia
    Yaari, Eti
    Baruchson-Arbib, Shifra
    Bar-Ilan, Judit
    JOURNAL OF INFORMATION SCIENCE, 2011, 37 (05) : 487 - 498
  • [36] Wikipedia model for collective intelligence: a review of information quality
    Lichtenstein, Sharman
    Parker, Craig M.
    INTERNATIONAL JOURNAL OF KNOWLEDGE AND LEARNING, 2009, 5 (3-4) : 254 - 272
  • [37] QuWi: Quality Control in Wikipedia
    Cusinato, Alberto
    Della Mea, Vincenzo
    Di Salvatore, Francesco
    Mizzaro, Stefano
    WICOW 09, 2009, : 27 - 34
  • [38] Problematizing and Addressing the Article-as-Concept Assumption in Wikipedia
    Lin, Yilun
    Yu, Bowen
    Hall, Andrew
    Hecht, Brent
    CSCW'17: PROCEEDINGS OF THE 2017 ACM CONFERENCE ON COMPUTER SUPPORTED COOPERATIVE WORK AND SOCIAL COMPUTING, 2017, : 2052 - 2067
  • [39] IN SEARCH OF "PERFECT ARTICLE": THE LIFE CYCLES OF TEXTS IN WIKIPEDIA
    d'Andrea, Carlos Frederico de Brito
    TEXTO LIVRE-LINGUAGEM E TECNOLOGIA, 2009, 2 (01): : 46 - 53
  • [40] An unsupervised approach for identifying the Infobox template of wikipedia article
    Bhuiyan, Hanif
    Oh, Kyeong-Jin
    Hong, Myung-Duk
    Jo, Geun-Sik
    2015 IEEE 18TH INTERNATIONAL CONFERENCE ON COMPUTATIONAL SCIENCE AND ENGINEERING (CSE), 2015, : 334 - 338