Improving Wikipedia verifiability with AI

被引:7
作者
Petroni, Fabio [1 ]
Broscheit, Samuel [2 ]
Piktus, Aleksandra [3 ]
Lewis, Patrick [3 ]
Izacard, Gautier [3 ,4 ]
Hosseini, Lucas [3 ]
Dwivedi-Yu, Jane [3 ]
Lomeli, Maria [3 ]
Schick, Timo [3 ]
Bevilacqua, Michele [1 ]
Mazare, Pierre-Emmanuel [3 ]
Joulin, Armand [3 ]
Grave, Edouard [3 ]
Riedel, Sebastian [3 ,5 ]
机构
[1] Samaya AI, London, England
[2] Amazon Alexa AI, Tubingen, Germany
[3] Meta, FAIR, London, England
[4] PSL Univ, Inria & ENS, Paris, France
[5] UCL, London, England
关键词
Compendex;
D O I
10.1038/s42256-023-00726-1
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Verifiability is a core content policy of Wikipedia: claims need to be backed by citations. Maintaining and improving the quality of Wikipedia references is an important challenge and there is a pressing need for better tools to assist humans in this effort. We show that the process of improving references can be tackled with the help of artificial intelligence (AI) powered by an information retrieval system and a language model. This neural-network-based system, which we call SIDE, can identify Wikipedia citations that are unlikely to support their claims, and subsequently recommend better ones from the web. We train this model on existing Wikipedia references, therefore learning from the contributions and combined wisdom of thousands of Wikipedia editors. Using crowdsourcing, we observe that for the top 10% most likely citations to be tagged as unverifiable by our system, humans prefer our system's suggested alternatives compared with the originally cited reference 70% of the time. To validate the applicability of our system, we built a demo to engage with the English-speaking Wikipedia community and find that SIDE's first citation recommendation is preferred twice as often as the existing Wikipedia citation for the same top 10% most likely unverifiable claims according to SIDE. Our results indicate that an AI-based system could be used, in tandem with humans, to improve the verifiability of Wikipedia.
引用
收藏
页码:1142 / 1148
页数:7
相关论文
共 30 条
  • [1] [Anonymous], 2023, Top websites ranking. similarweb
  • [2] Baeza-Yates R., 1999, MODERN INFORM RETRIE
  • [3] Bowman S. R., 2015, P 2015 C EMP METH NA, P632, DOI [DOI 10.18653/V1/D15, DOI 10.18653/V1/D15-1075, 10.18653/v1/D15-1075]
  • [4] Camburu OM, 2018, ADV NEUR IN, V31
  • [5] Devlin J, 2019, 2019 CONFERENCE OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: HUMAN LANGUAGE TECHNOLOGIES (NAACL HLT 2019), VOL. 1, P4171
  • [6] References in Wikipedia: The Editors' Perspective
    Kaffee, Lucie-Aimee
    Elsahar, Hady
    [J]. WEB CONFERENCE 2021: COMPANION OF THE WORLD WIDE WEB CONFERENCE (WWW 2021), 2021, : 535 - 538
  • [7] Karpukhin V, 2020, PROCEEDINGS OF THE 2020 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP), P6769
  • [8] Lewis M., 2020, P 58 ANN M ASS COMP, P7871, DOI DOI 10.18653/V1/2020.ACL-MAIN.703
  • [9] Modeling Popularity and Reliability of Sources in Multilingual Wikipedia
    Lewoniewski, Wlodzimierz
    Wecel, Krzysztof
    Abramowicz, Witold
    [J]. INFORMATION, 2020, 11 (05)
  • [10] Pyserini: A Python']Python Toolkit for Reproducible Information Retrieval Research with Sparse and Dense Representations
    Lin, Jimmy
    Ma, Xueguang
    Lin, Sheng-Chieh
    Yang, Jheng-Hong
    Pradeep, Ronak
    Nogueira, Rodrigo
    [J]. SIGIR '21 - PROCEEDINGS OF THE 44TH INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL, 2021, : 2356 - 2362