CLUSTERING AND BOOTSTRAPPING BASED FRAMEWORK FOR NEWS KNOWLEDGE BASE COMPLETION

被引:0
|
作者
Srinivasa, K. [1 ]
Thilagam, P. Santhi [1 ]
机构
[1] Natl Inst Technol Karnataka, Dept Comp Sci & Engn, NH 66, Mangalore 575025, India
关键词
Knowledge base completion; natural language processing; information extraction; 1002triples; bootstrap; cluster; INFORMATION EXTRACTION; CONSTRUCTION;
D O I
10.31577/cai_2021_2_318
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Extracting the facts, namely entities and relations, from unstructured sources is an essential step in any knowledge base construction. At the same time, it is also necessary to ensure the completeness of the knowledge base by incremen-tally extracting the new facts from various sources. To date, the knowledge base completion is studied as a problem of knowledge refinement where the missing facts are inferred by reasoning about the information already present in the knowledge base. However, facts missed while extracting the information from multilingual sources are ignored. Hence, this work proposed a generic framework for know-ledge base completion to enrich a knowledge base of crime-related facts extracted from online news articles in the English language, with the facts extracted from low resourced Indian language Hindi news articles. Using the framework, informa-tion from any low-resourced language news articles can be extracted without using language-specific tools like POS tags and using an appropriate machine translation tool. To achieve this, a clustering algorithm is proposed, which explores the redun-dancy among the bilingual collection of news articles by representing the clusters with knowledge base facts unlike the existing Bag of Words representation. From each cluster, the facts extracted from English language articles are bootstrapped to extract the facts from comparable Hindi language articles. This way of boot-strapping within the cluster helps to identify the sentences from a low-resourced language that are enriched with new information related to the facts extracted from a high-resourced language like English. The empirical result shows that the proposed clustering algorithm produced more accurate and high-quality clusters for monolingual and cross-lingual facts, respectively. Experiments also proved that the proposed framework achieves a high recall rate in extracting the new facts from Hindi news articles.
引用
收藏
页码:318 / 340
页数:23
相关论文
共 50 条
  • [31] Knowledge Base Completion by Inference from Both Relational and Literal Facts
    Wang, Zhichun
    Huang, Yong
    ADVANCES IN KNOWLEDGE DISCOVERY AND DATA MINING, PAKDD 2019, PT III, 2019, 11441 : 501 - 513
  • [32] IFGAN: Information fusion generative adversarial network for knowledge base completion
    Zhang, Tianchen
    Bi, Zhongqin
    Shan, Meijing
    Li, Yongbin
    EXPERT SYSTEMS, 2022, 39 (06)
  • [33] Towards the Completion of a Domain-Specific Knowledge Base with Emerging Query Terms
    Jiang, Sihang
    Liang, Jiaqing
    Xiao, Yanghua
    Tang, Haihong
    Huang, Haikuan
    Tan, Jun
    2019 IEEE 35TH INTERNATIONAL CONFERENCE ON DATA ENGINEERING (ICDE 2019), 2019, : 1430 - 1441
  • [34] Modeling relation paths for knowledge base completion via joint adversarial training
    Li, Chen
    Peng, Xutan
    Zhang, Shanghang
    Peng, Hao
    Yu, Philip S.
    He, Min
    Du, Linfeng
    Wang, Lihong
    KNOWLEDGE-BASED SYSTEMS, 2020, 201
  • [35] Question Answering System based on Diease Knowledge Base
    Wang, Xuan
    Wang, Zhijun
    PROCEEDINGS OF 2020 IEEE 11TH INTERNATIONAL CONFERENCE ON SOFTWARE ENGINEERING AND SERVICE SCIENCE (ICSESS 2020), 2020, : 351 - 354
  • [36] Automated Knowledge Base Completion Using Collaborative Filtering and Deep Reinforcement Learning
    Tortay, Alisher
    Lee, Jee Hang
    Lee, Chang Hwa
    Lee, Sang Wan
    2018 IEEE INTERNATIONAL CONFERENCE ON SYSTEMS, MAN, AND CYBERNETICS (SMC), 2018, : 3069 - 3074
  • [37] Knowledge Graph Completion With Pattern-Based Methods
    Sabet, Maryam
    Pajoohan, Mohammadreza
    Moosavi, Mohammad Reza
    IEEE ACCESS, 2025, 13 : 5815 - 5831
  • [38] A generative adversarial network for single and multi-hop distributional knowledge base completion
    Zia, Tehseen
    Windridge, David
    NEUROCOMPUTING, 2021, 461 : 543 - 551
  • [39] Open Domain Question Answering System Based on Knowledge Base
    Lai, Yuxuan
    Lin, Yang
    Chen, Jiahao
    Feng, Yansong
    Zhao, Dongyan
    NATURAL LANGUAGE UNDERSTANDING AND INTELLIGENT APPLICATIONS (NLPCC 2016), 2016, 10102 : 722 - 733
  • [40] CN-DBpedia2: An Extraction and Verification Framework for Enriching Chinese Encyclopedia Knowledge Base
    Xu, Bo
    Liang, Jiaqing
    Xie, Chenhao
    Liang, Bin
    Chen, Lihan
    Xiao, Yanghua
    DATA INTELLIGENCE, 2019, 1 (03) : 271 - 288