The YAGO-NAGA approach to knowledge discovery

被引:29
作者
Max Planck Institute for Informatics, D-66123 Saarbruecken, Germany [1 ]
机构
[1] Max Planck Institute for Informatics
来源
SIGMOD Rec. | 2008年 / 4卷 / 41-47期
关键词
Knowledge based systems;
D O I
10.1145/1519103.1519110
中图分类号
学科分类号
摘要
This paper gives an overview on the YAGO-NAGA approach to information extraction for building a conveniently searchable, large-scale, highly accurate knowledge base of common facts. YAGO harvests infoboxes and category names of Wikipedia for facts about individual entities, and it reconciles these with the taxonomic backbone of WordNet in order to ensure that all entities have proper classes and the class system is consistent. Currently, the YAGO knowledge base contains about 19 million instances of binary relations for about 1.95 million entities. Based on intensive sampling, its accuracy is estimated to be above 95 percent. The paper presents the architecture of the YAGO extractor toolkit, its distinctive approach to consistency checking, its provisions for maintenance and further growth, and the query engine for YAGO, coined NAGA. It also discusses ongoing work on extensions towards integrating fact candidates extracted from natural-language text sources.
引用
收藏
页码:41 / 47
页数:6
相关论文
共 21 条
[1]  
Eugene Agichtein, Scaling Information Extraction to Large Document Collections, IEEE Data Eng. Bull, 28, 4, (2005)
[2]  
Banko M., Cafarella M.J., Soderland S., Matthew Broadhead O.E., Open Information Extraction from the Web, (2007)
[3]  
Cafarella M.J., Re C., Dan Suciu O.E., Structured Querying of Web Text Data: A Technical Challenge
[4]  
Hamish Cunningham, An Introduction to Information Extraction, Encyclopedia of Language and Linguistics, (2005)
[5]  
DeRose P., Shen W., Chen F., AnHai Doan R.R., Building Structured Web Community Portals: A Top-Down, Compositional, and Incremental Approach, (2007)
[6]  
Etzioni O., Cafarella M.J., Downey D., Popescu A.-M., Shaked T., Soderland S., Weld D.S., Alexander Yates: Unsupervised Named-Entity Extraction from the Web: An Experimental Study, Artif. Intell, 165, 1, (2005)
[7]  
Ipeirotis P.G., Agichtein E., Pranay Jain L.G., Towards a Query Optimizer for Text-Centric Tasks, ACM Trans. Database Syst, 32, 4, (2007)
[8]  
Kasneci G., Suchanek F.M., Ifrim G., Maya Ramanath G.W., NAGA: Searching and Ranking Knowledge
[9]  
Kasneci G., Ramanath M., Sozio M., Suchanek F.M., Gerhard Weikum: STAR
[10]  
Liu X., Bruce Croft W., Statistical Language Modeling for Information Retrieval, Annual Review of Information Science and Technology, 39, (2004)