Predicting Links on Wikipedia with Anchor Text Information

被引:0
作者
Brochier, Robin [1 ]
Bechet, Frederic [1 ]
机构
[1] Aix Marseille Univ, Univ Toulon, CNRS, LIS, Marseille, France
来源
SIGIR '21 - PROCEEDINGS OF THE 44TH INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL | 2021年
关键词
Wikipedia; link prediction; evaluation; hyperlinks; NETWORKS;
D O I
10.1145/3404835.3462994
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Wikipedia, the largest open-collaborative online encyclopedia, is a corpus of documents bound together by internal hyperlinks. These links form the building blocks of a large network whose structure contains important information on the concepts covered in this encyclopedia. The presence of a link between two articles, materialised by an anchor text in the source page pointing to the target page, can increase readers' understanding of a topic. However, the process of linking follows specific editorial rules to avoid both under-linking and over-linking. In this paper, we study the transductive and the inductive tasks of link prediction on several subsets of the English Wikipedia and identify some key challenges behind automatic linking based on anchor text information. We propose an appropriate evaluation sampling methodology and compare several algorithms. Moreover, we propose baseline models that provide a good estimation of the overall difficulty of the tasks.
引用
收藏
页码:1758 / 1762
页数:5
相关论文
共 50 条
  • [31] A Deeper Investigation of the Importance of Wikipedia Links to Search Engine Results
    Vincent N.
    Hecht B.
    Proceedings of the ACM on Human-Computer Interaction, 2021, 5 (CSCW1):
  • [32] Enhancing Wikipedia Search Results Using Text Mining
    Kapugama, K. D. C. G.
    Lorensuhewa, S. A. S.
    Kalyani, M. A. L.
    2016 SIXTEENTH INTERNATIONAL CONFERENCE ON ADVANCES IN ICT FOR EMERGING REGIONS (ICTER) - 2016, 2016, : 168 - 175
  • [33] Using Wikipedia Categories for Discovering the Themes of Text Documents
    Bawakid, Abdullah
    2015 7TH INTERNATIONAL CONFERENCE ON INTELLIGENT HUMAN-MACHINE SYSTEMS AND CYBERNETICS IHMSC 2015, VOL I, 2015, : 452 - 455
  • [34] A Wikipedia-based Semantic Model for Text Clustering
    Zhou, Jing-min
    Cui, Qing-jun
    Zhang, Hui
    2011 INTERNATIONAL CONFERENCE ON FUTURE COMPUTER SCIENCE AND APPLICATION (FCSA 2011), VOL 2, 2011, : 413 - 416
  • [35] Exploiting Wikipedia for Information Retrieval Tasks
    Shapira, Bracha
    Ofek, Nir
    Makarenkov, Victor
    SIGIR 2015: PROCEEDINGS OF THE 38TH INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL, 2015, : 1137 - 1140
  • [36] Framing and social information nudges at Wikipedia *
    Linek, Maximilian
    Traxler, Christian
    JOURNAL OF ECONOMIC BEHAVIOR & ORGANIZATION, 2021, 188 : 1269 - 1279
  • [37] Multi-level Topical Text Categorization with Wikipedia
    Guo, Nan
    He, Yuan
    Yan, ChunGang
    Liu, Lu
    Wang, Cheng
    2016 IEEE/ACM 9TH INTERNATIONAL CONFERENCE ON UTILITY AND CLOUD COMPUTING (UCC), 2016, : 343 - 352
  • [38] Information Retrieval in Wikipedia with Conceptual Directions
    Szymanski, Julian
    DISTRIBUTED COMPUTING AND INTERNET TECHNOLOGY, ICDCIT 2015, 2015, 8956 : 391 - 402
  • [39] Political Advertising on the Wikipedia Marketplace of Information
    Goebel, Sascha
    Munzert, Simon
    SOCIAL SCIENCE COMPUTER REVIEW, 2018, 36 (02) : 157 - 175
  • [40] Using Wikipedia to Teach Information Literacy
    Jennings, Eric
    COLLEGE & UNDERGRADUATE LIBRARIES, 2008, 15 (04) : 432 - 437