Generating Entity Embeddings for Populating Wikipedia Knowledge Graph by Notability Detection

被引:0
作者
Thota, Gokul [1 ]
Varma, Vasudeva [1 ]
机构
[1] IIIT Hyderabad, Hyderabad, India
来源
NATURAL LANGUAGE PROCESSING AND INFORMATION SYSTEMS, PT II, NLDB 2024 | 2024年 / 14763卷
关键词
Knowledge Graphs; Notability; Entity Embeddings; Wikipedia; Classification;
D O I
10.1007/978-3-031-70242-6_2
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Knowledge graphs (KGs) have been playing a crucial role in leveraging information on web for several downstream tasks. Despite previous efforts in populating KGs, these methods typically do not focus on analyzing entity-specific content exclusively but rely on a fixed collection of documents. We define an approach to populate such KGs by utilizing entity-specific content on the web, for generating entity embeddings. We empirically prove our approach's effectiveness, by utilizing it for a downstream task of Notability detection, associated with the Wikipedia Knowledge graph. To moderate content uploaded to Wikipedia, "Notability" guidelines are defined by its editors to identify entities warranting article on Wikipedia. So far notability is enforced by humans, which makes scalability an issue. In this paper, we define a multipronged approach based on web-based entity features, to construct entity embeddings for determining an entity's notability. We distinguish entities based on their categories and utilize neural networks for classification. Our system outperforms machine learning-based classifiers and handcrafted entity salience detection algorithms, by achieving performance accuracy of around 88%. Our system provides a scalable alternative to manual decision-making about the importance of a topic, which could be extended to other such KG-based tasks.
引用
收藏
页码:10 / 23
页数:14
相关论文
共 9 条
[1]   BERT-ER: Query-specific BERT Entity Representations for Entity Ranking [J].
Chatterjee, Shubham ;
Dietz, Laura .
PROCEEDINGS OF THE 45TH INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL (SIGIR '22), 2022, :1466-1477
[2]  
Devlin J, 2019, 2019 CONFERENCE OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: HUMAN LANGUAGE TECHNOLOGIES (NAACL HLT 2019), VOL. 1, P4171
[3]   On Emerging Entity Detection [J].
Faerber, Michael ;
Rettinger, Achim ;
El Asmar, Boulos .
KNOWLEDGE ENGINEERING AND KNOWLEDGE MANAGEMENT, EKAW 2016, 2016, 10024 :223-238
[4]   Enhancing Online Knowledge Graph Population with Semantic Knowledge [J].
Fernandez-Canellas, Delia ;
Marco Rimmek, Joan ;
Espadaler, Joan ;
Garolera, Blai ;
Barja, Adria ;
Codina, Marc ;
Sastre, Marc ;
Giro-i-Nieto, Xavier ;
Carlos Riveiro, Juan ;
Bou-Balust, Elisenda .
SEMANTIC WEB - ISWC 2020, PT I, 2020, 12506 :183-200
[5]   Notability Determination for Wikipedia [J].
Pochampally, Yashaswi ;
Karlapalem, Kamalakar .
WWW'17 COMPANION: PROCEEDINGS OF THE 26TH INTERNATIONAL CONFERENCE ON WORLD WIDE WEB, 2017, :1641-1646
[6]   Improvement of query-based text summarization using word sense disambiguation [J].
Rahman, Nazreena ;
Borah, Bhogeswar .
COMPLEX & INTELLIGENT SYSTEMS, 2020, 6 (01) :75-85
[7]   Extracting Entity-specific Substructures for RDF Graph Embedding [J].
Saeed, Muhammad Rizwan ;
Prasanna, Viktor K. .
2018 IEEE INTERNATIONAL CONFERENCE ON INFORMATION REUSE AND INTEGRATION (IRI), 2018, :378-385
[8]   Towards Better Text Understanding and Retrieval through Kernel Entity Salience Modeling [J].
Xiong, Chenyan ;
Liu, Zhengzhong ;
Callan, Jamie ;
Liu, Tie-Yan .
ACM/SIGIR PROCEEDINGS 2018, 2018, :575-584
[9]   Ad Hoc Table Retrieval using Semantic Similarity [J].
Zhang, Shuo ;
Balog, Krisztian .
WEB CONFERENCE 2018: PROCEEDINGS OF THE WORLD WIDE WEB CONFERENCE (WWW2018), 2018, :1553-1562