Refined Commonsense Knowledge From Large-Scale Web Contents

被引:7
作者
Nguyen, Tuan-Phong [1 ]
Razniewski, Simon [1 ]
Romero, Julien [2 ]
Weikum, Gerhard [1 ]
机构
[1] Max Planck Inst Informat, D-66123 Saarbrucken, Germany
[2] Telecom SudParis, F-91000 Evry, France
关键词
Semantics; Commonsense reasoning; Knowledge representation; Dinosaurs; Cleaning; Taxonomy; Task analysis; Commonsense knowledge; knowledge base construction; SENSE; CYC;
D O I
10.1109/TKDE.2022.3206505
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Commonsense knowledge (CSK) about concepts and their properties is helpful for AI applications. Prior works, such as ConceptNet, have compiled large CSK collections. However, they are restricted in their expressiveness to subject-predicate-object (SPO) triples with simple concepts for S and strings for P and O. This paper presents a method called Ascent++ to automatically build a large-scale knowledge base (KB) of CSK assertions, with refined expressiveness and both better precision and recall than prior works. Ascent++ goes beyond SPO triples by capturing composite concepts with subgroups and aspects, and by refining assertions with semantic facets. The latter is essential to express the temporal and spatial validity of assertions and further qualifiers. Furthermore, Ascent++ combines open information extraction (OpenIE) with judicious cleaning and ranking by typicality and saliency scores. For high coverage, our method taps into the large-scale crawl C4 with broad web contents. The evaluation with human judgments shows the superior quality of the Ascent++ KB, and an extrinsic evaluation for QA-support tasks underlines the benefits of Ascent++. A web interface, data, and code can be accessed at https://ascentpp.mpi-inf.mpg.de/.
引用
收藏
页码:8431 / 8447
页数:17
相关论文
共 85 条
[1]  
Bhakthavatsalam S, 2020, Arxiv, DOI arXiv:2005.00660
[2]  
Bhakthavatsalam S, 2020, Arxiv, DOI [arXiv:2006.07510, DOI 10.48550/ARXIV.2006.07510]
[3]  
Bosselut A, 2019, 57TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2019), P4762
[4]  
Bouraoui Z, 2020, AAAI CONF ARTIF INTE, V34, P7456
[5]  
Brown TB, 2020, ADV NEUR IN, V33
[6]  
Cetto M., 2018, P 27 INT C COMP LING
[7]  
Chalier Y., 2020, PROC C AUTOMATED KNO
[8]  
Chen Tongfei, 2020, P 58 ANN M ASS COMPU, P8772, DOI [DOI 10.18653/V1/2020.ACL-MAIN.774, 10.18653/v1/2020.acl-main.774]
[9]  
Clark P, 2018, Arxiv, DOI arXiv:1803.05457
[10]  
Clarke J, 2012, LREC 2012 - EIGHTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, P3276