HighLife: Higher-arity Fact Harvesting

被引:21
作者
Ernst, Patrick [1 ]
Siu, Amy [1 ]
Weikum, Gerhard [1 ]
机构
[1] Max Planck Inst Informat, Saarbrucken, Germany
来源
WEB CONFERENCE 2018: PROCEEDINGS OF THE WORLD WIDE WEB CONFERENCE (WWW2018) | 2018年
关键词
MODEL;
D O I
10.1145/3178876.3186000
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
Text-based knowledge extraction methods for populating knowledge bases have focused on binary facts: relationships between two entities. However, in advanced domains such as health, it is often crucial to consider ternary and higher-arity relations. An example is to capture which drug is used for which disease at which dosage (e.g. 2.5 mg/day) for which kinds of patients (e.g., children vs. adults). In this work, we present an approach to harvest higher-arity facts from textual sources. Our method is distantly supervised by seed facts, and uses the fact-pattern duality principle to gather fact candidates with high recall. For high precision, we devise a constraint-based reasoning method to eliminate false candidates. A major novelty is in coping with the difficulty that higher-arity facts are often expressed only partially in texts and strewn across multiple sources. For example, one sentence may refer to a drug, a disease and a group of patients, whereas another sentence talks about the drug, its dosage and the target group without mentioning the disease. Our methods cope well with such partially observed facts, at both pattern-learning and constraint-reasoning stages. Experiments with health-related documents and with news articles demonstrate the viability of our method.
引用
收藏
页码:1013 / 1022
页数:10
相关论文
共 50 条
[1]  
[Anonymous], 2005, P HUM LANG TECHN C C
[2]  
[Anonymous], 2012, P 5 ACM INT C WEB S
[3]  
[Anonymous], 2011, P 4 ACM INT C WEB SE, DOI DOI 10.1145/1935826.1935869
[4]  
[Anonymous], P WEBDB, DOI DOI 10.1007/I
[5]  
[Anonymous], 2011, P 2011 C EMPIRICAL M, DOI DOI 10.3115/V1/D11-1072
[6]   FrameNet, current collaborations and future goals [J].
Baker, Collin F. .
LANGUAGE RESOURCES AND EVALUATION, 2012, 46 (02) :269-286
[7]   An Index for Efficient Semantic Full-Text Search [J].
Bast, Hannah ;
Buchhold, Bjoern .
PROCEEDINGS OF THE 22ND ACM INTERNATIONAL CONFERENCE ON INFORMATION & KNOWLEDGE MANAGEMENT (CIKM'13), 2013, :369-378
[8]  
Berrahou S. L., 2016, INT C WEB INT MIN SE
[9]  
Brachman Ronald, 2004, MORGAN KAUFMANN SERI
[10]  
Carlson Andrew, 2010, P 3 ACM INT C WEB SE, DOI 10.1145/ 1718487.1718501