Predicting Links on Wikipedia with Anchor Text Information

被引:0
作者
Brochier, Robin [1 ]
Bechet, Frederic [1 ]
机构
[1] Aix Marseille Univ, Univ Toulon, CNRS, LIS, Marseille, France
来源
SIGIR '21 - PROCEEDINGS OF THE 44TH INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL | 2021年
关键词
Wikipedia; link prediction; evaluation; hyperlinks; NETWORKS;
D O I
10.1145/3404835.3462994
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Wikipedia, the largest open-collaborative online encyclopedia, is a corpus of documents bound together by internal hyperlinks. These links form the building blocks of a large network whose structure contains important information on the concepts covered in this encyclopedia. The presence of a link between two articles, materialised by an anchor text in the source page pointing to the target page, can increase readers' understanding of a topic. However, the process of linking follows specific editorial rules to avoid both under-linking and over-linking. In this paper, we study the transductive and the inductive tasks of link prediction on several subsets of the English Wikipedia and identify some key challenges behind automatic linking based on anchor text information. We propose an appropriate evaluation sampling methodology and compare several algorithms. Moreover, we propose baseline models that provide a good estimation of the overall difficulty of the tasks.
引用
收藏
页码:1758 / 1762
页数:5
相关论文
共 50 条
  • [41] Wikipedia as an Information Source on Cryptocurrency Technology
    Stolarski, Piotr
    Lewoniewski, Wlodzimierz
    BUSINESS INFORMATION SYSTEMS WORKSHOPS, BIS 2019, 2019, 373 : 299 - 308
  • [42] Cultural Structures of Knowledge from Wikipedia Networks of First Links
    Gabella, Maxime
    IEEE TRANSACTIONS ON NETWORK SCIENCE AND ENGINEERING, 2019, 6 (03): : 249 - 252
  • [43] DOI Links on Wikipedia Analyses of English, Japanese, and Chinese Wikipedias
    Kikkawa, Jiro
    Takaku, Masao
    Yoshikane, Fuyuki
    DIGITAL LIBRARIES: KNOWLEDGE, INFORMATION, AND DATA IN AN OPEN ACCESS SOCIETY, 2016, 10075 : 369 - 380
  • [44] A fast algorithm for predicting links to nodes of interest
    Chen, Bolun
    Chen, Ling
    Li, Bin
    INFORMATION SCIENCES, 2016, 329 : 552 - 567
  • [45] WikiTrends: Unstructured Wikipedia-Based Text Analytics Framework
    Gerguis, Michel Naim
    Salama, Cherif
    El-Kharashi, M. Watheq
    NATURAL LANGUAGE PROCESSING AND INFORMATION SYSTEMS, NLDB 2017, 2017, 10260 : 45 - 57
  • [46] WikiLyzer: Interactive Information Quality Assessment in Wikipedia
    di Sciascio, Cecilia
    Strohmaier, David
    Errecalde, Marcelo
    Veas, Eduardo
    IUI'17: PROCEEDINGS OF THE 22ND INTERNATIONAL CONFERENCE ON INTELLIGENT USER INTERFACES, 2017, : 377 - 388
  • [47] TextRank algorithm by exploiting Wikipedia for short text keywords extraction
    Li, Wengen
    Zhao, Jiabao
    2016 3RD INTERNATIONAL CONFERENCE ON INFORMATION SCIENCE AND CONTROL ENGINEERING (ICISCE), 2016, : 683 - 686
  • [48] Entity ranking in Wikipedia: utilising categories, links and topic difficulty prediction
    Jovan Pehcevski
    James A. Thom
    Anne-Marie Vercoustre
    Vladimir Naumovski
    Information Retrieval, 2010, 13 : 568 - 600
  • [49] An efficient approach for measuring semantic relatedness using Wikipedia bidirectional links
    Xinhua Zhu
    Qingsong Guo
    Bo Zhang
    Fei Li
    Applied Intelligence, 2019, 49 : 3708 - 3730
  • [50] Seeking Health Information in Wikipedia and Readers' Satisfaction
    Ju, Boryung
    Jung, Yoonhyuk
    Bourgeois, John P.
    Proceedings of the Association for Information Science and Technology, 2021, 58 (01): : 744 - 746