Corroborate and Learn Facts from the Web

被引:0
作者
Zhao, Shubin [1 ]
Betz, Jonathan [1 ]
机构
[1] Google Inc, New York, NY 10011 USA
来源
KDD-2007 PROCEEDINGS OF THE THIRTEENTH ACM SIGKDD INTERNATIONAL CONFERENCE ON KNOWLEDGE DISCOVERY AND DATA MINING | 2007年
关键词
Web Mining; Information Extraction; Bootstrapping;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The web contains lots of interesting factual information about entities, such as celebrities. movies or products. This paper describes a robust bootstrapping approach to corroborate facts and learn more facts simultaneously. This approach starts with retrieving relevant pages from a crawl repository for each entity in the seed set. In each learning cycle, known facts of an entity are corroborated first in a relevant page to find fact mentions. When fact mentions are found, they are taken as examples for learning new facts from the page via HTML pattern discovery. Extracted new facts are added to the known fact set for the next learning cycle. The bootstrapping process continues until no new facts can be learned. This approach is language-independent. It demonstrated good performance in experiment on country facts. Results of a large scale experiment will also be shown with initial facts imported from wikipedia.
引用
收藏
页码:995 / 1003
页数:9
相关论文
共 11 条
  • [1] [Anonymous], P 40 ANN M ASS COMP
  • [2] Brin S, 1999, LECT NOTES COMPUT SC, V1590, P172
  • [3] CHANG SCL, P 10 INT WORLD WID W
  • [4] COHEN W, 2002, P 11 INT WORLD WID W
  • [5] Dean J., 2004, P 6 S OP SYST DES IM
  • [6] Etzioni O., 2004, WEB SCALE INFORM EXT
  • [7] FELDMAN R, 2006, ISMIS, P755
  • [8] HARABAGIU S, 2000, P 18 INT C COMP LING, P292
  • [9] KUSHMERICK DWN, 1997, P 15 INT JOINT C ART, P729
  • [10] LIU RGB, P ACM SICKDD INT C K, P601