Predicting Links on Wikipedia with Anchor Text Information

被引:0
作者
Brochier, Robin [1 ]
Bechet, Frederic [1 ]
机构
[1] Aix Marseille Univ, Univ Toulon, CNRS, LIS, Marseille, France
来源
SIGIR '21 - PROCEEDINGS OF THE 44TH INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL | 2021年
关键词
Wikipedia; link prediction; evaluation; hyperlinks; NETWORKS;
D O I
10.1145/3404835.3462994
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Wikipedia, the largest open-collaborative online encyclopedia, is a corpus of documents bound together by internal hyperlinks. These links form the building blocks of a large network whose structure contains important information on the concepts covered in this encyclopedia. The presence of a link between two articles, materialised by an anchor text in the source page pointing to the target page, can increase readers' understanding of a topic. However, the process of linking follows specific editorial rules to avoid both under-linking and over-linking. In this paper, we study the transductive and the inductive tasks of link prediction on several subsets of the English Wikipedia and identify some key challenges behind automatic linking based on anchor text information. We propose an appropriate evaluation sampling methodology and compare several algorithms. Moreover, we propose baseline models that provide a good estimation of the overall difficulty of the tasks.
引用
收藏
页码:1758 / 1762
页数:5
相关论文
共 50 条
  • [1] Using Wikipedia as a reference for extracting semantic information from a text
    Prato, Andrea
    Ronchetti, Marco
    2009 THIRD INTERNATIONAL CONFERENCE ON ADVANCES IN SEMANTIC PROCESSING, 2009, : 56 - 61
  • [2] Text summarization using Wikipedia
    Sankarasubramaniam, Yogesh
    Ramanathan, Krishnan
    Ghosh, Subhankar
    INFORMATION PROCESSING & MANAGEMENT, 2014, 50 (03) : 443 - 461
  • [3] Matching Ukrainian Wikipedia Red Links with English Wikipedia's Articles
    Liubonko, Kateryna
    Saez-Trumper, Diego
    WWW'20: COMPANION PROCEEDINGS OF THE WEB CONFERENCE 2020, 2020, : 819 - 826
  • [4] Visual Positions of Links and Clicks on Wikipedia
    Dimitrov, Dimitar
    Singer, Philipp
    Lemmerich, Florian
    Strohmaier, Markus
    PROCEEDINGS OF THE 25TH INTERNATIONAL CONFERENCE ON WORLD WIDE WEB (WWW'16 COMPANION), 2016, : 27 - 28
  • [5] Embedding Wikipedia Title Based on Its Wikipedia Text and Categories
    Chen, Chi-Yen
    Ma, Wei-Yun
    2017 INTERNATIONAL CONFERENCE ON ASIAN LANGUAGE PROCESSING (IALP), 2017, : 146 - 149
  • [6] Bidirectional Fragment to Fragment Links in Wikipedia
    Olewniczak, Szymon
    Boinski, Tomasz
    Szymanski, Julian
    PROCEEDINGS OF THE 21ST EUROPEAN CONFERENCE ON KNOWLEDGE MANAGEMENT (ECKM 2020), 2020, : 585 - 594
  • [7] Predicting Information Quality Flaws in Wikipedia by Using Classical and Deep Learning Approaches
    Pereyra, Geronimo Bazan
    Cuello, Carolina
    Capodici, Gianfranco
    Jofre, Vanessa
    Ferretti, Edgardo
    Bonnin, Rodolfo
    Errecalde, Marcelo
    COMPUTER SCIENCE - CACIC 2019, 2020, 1184 : 3 - 18
  • [8] Semantic Enrichment of Text Representation with Wikipedia for Text Classification
    Yamakawa, Hiroki
    Peng, Jing
    Feldman, Anna
    IEEE INTERNATIONAL CONFERENCE ON SYSTEMS, MAN AND CYBERNETICS (SMC 2010), 2010,
  • [9] Information Behavior - Searching for information in Wikipedia
    Knaeusl, Hanna
    INFORMATION-WISSENSCHAFT UND PRAXIS, 2015, 66 (01): : 10 - 16
  • [10] Predicting links in ego-networks using temporal information
    Tabourier, Lionel
    Libert, Anne-Sophie
    Lambiotte, Renaud
    EPJ DATA SCIENCE, 2016, 5