CLUSTERING AND BOOTSTRAPPING BASED FRAMEWORK FOR NEWS KNOWLEDGE BASE COMPLETION

被引:0
|
作者
Srinivasa, K. [1 ]
Thilagam, P. Santhi [1 ]
机构
[1] Natl Inst Technol Karnataka, Dept Comp Sci & Engn, NH 66, Mangalore 575025, India
关键词
Knowledge base completion; natural language processing; information extraction; 1002triples; bootstrap; cluster; INFORMATION EXTRACTION; CONSTRUCTION;
D O I
10.31577/cai_2021_2_318
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Extracting the facts, namely entities and relations, from unstructured sources is an essential step in any knowledge base construction. At the same time, it is also necessary to ensure the completeness of the knowledge base by incremen-tally extracting the new facts from various sources. To date, the knowledge base completion is studied as a problem of knowledge refinement where the missing facts are inferred by reasoning about the information already present in the knowledge base. However, facts missed while extracting the information from multilingual sources are ignored. Hence, this work proposed a generic framework for know-ledge base completion to enrich a knowledge base of crime-related facts extracted from online news articles in the English language, with the facts extracted from low resourced Indian language Hindi news articles. Using the framework, informa-tion from any low-resourced language news articles can be extracted without using language-specific tools like POS tags and using an appropriate machine translation tool. To achieve this, a clustering algorithm is proposed, which explores the redun-dancy among the bilingual collection of news articles by representing the clusters with knowledge base facts unlike the existing Bag of Words representation. From each cluster, the facts extracted from English language articles are bootstrapped to extract the facts from comparable Hindi language articles. This way of boot-strapping within the cluster helps to identify the sentences from a low-resourced language that are enriched with new information related to the facts extracted from a high-resourced language like English. The empirical result shows that the proposed clustering algorithm produced more accurate and high-quality clusters for monolingual and cross-lingual facts, respectively. Experiments also proved that the proposed framework achieves a high recall rate in extracting the new facts from Hindi news articles.
引用
收藏
页码:318 / 340
页数:23
相关论文
共 50 条
  • [41] The spatial organization of the news industry: Questioning assumptions about knowledge externalities for clustering of creative industries
    Vang, Jan
    INNOVATION-ORGANIZATION & MANAGEMENT, 2007, 9 (01): : 14 - 27
  • [42] Towards a language-independent solution: Knowledge base completion by searching the Web and deriving language pattern
    Bing, Lidong
    Zhang, Zhiming
    Lam, Wai
    Cohen, William W.
    KNOWLEDGE-BASED SYSTEMS, 2017, 115 : 80 - 86
  • [43] Building a Business Knowledge Base by a Supervised Learning and Rule-Based Method
    Shin, Sungho
    Jung, Hanmin
    Yi, Mun Yong
    KSII TRANSACTIONS ON INTERNET AND INFORMATION SYSTEMS, 2015, 9 (01): : 407 - 420
  • [44] An Information Extraction System of B2B Based on Knowledge Base
    Cui Yang
    Yang Bingru
    2009 INTERNATIONAL CONFERENCE ON E-BUSINESS AND INFORMATION SYSTEM SECURITY, VOLS 1 AND 2, 2009, : 743 - 746
  • [45] A Knowledge-Based Framework for Information Extraction from Clinical Practice Guidelines
    Loglisci, Corrado
    Ceci, Michelangelo
    Malerba, Donato
    FOUNDATIONS OF INTELLIGENT SYSTEMS, PROCEEDINGS, 2009, 5722 : 119 - 128
  • [46] Representing Multiword Term Variation in a Terminological Knowledge Base: a Corpus-Based Study
    Leon-Arauz, Pilar
    Cabezas-Garcia, Melania
    Reimerink, Arianne
    PROCEEDINGS OF THE 12TH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION (LREC 2020), 2020, : 2358 - 2367
  • [47] Research on the Semantic and Structure Fusion-Based Knowledge Graph Completion Model
    Ma Z.
    Gao Y.
    Zhang Q.
    Zhou H.
    Li B.
    Tao W.
    Data Analysis and Knowledge Discovery, 2024, 8 (04) : 39 - 49
  • [48] A Cybersecurity Knowledge Graph Completion Method Based on Ensemble Learning and Adversarial Training
    Wang, Peng
    Liu, Jingju
    Hou, Dongdong
    Zhou, Shicheng
    APPLIED SCIENCES-BASEL, 2022, 12 (24):
  • [49] Simple knowledge graph completion model based on PU learning and prompt learning
    Duan, Li
    Wang, Jing
    Luo, Bing
    Sun, Qiao
    KNOWLEDGE AND INFORMATION SYSTEMS, 2024, 66 (04) : 2683 - 2697
  • [50] Simple knowledge graph completion model based on PU learning and prompt learning
    Li Duan
    Jing Wang
    Bing Luo
    Qiao Sun
    Knowledge and Information Systems, 2024, 66 : 2683 - 2697