Open Information Extraction from the Web

被引：299

作者：

Etzioni, Oren ^{[1
]}

Banko, Michele

Soderland, Stephen ^{[2
]}

Weld, Daniel S.

机构：

[1] Univ Washington, Turing Ctr, Seattle, WA 98195 USA

[2] Univ Washington, Dept Comp Sci & Engn, Seattle, WA 98195 USA

来源：

COMMUNICATIONS OF THE ACM | 2008年 / 51卷 / 12期

关键词：

D O I：

10.1145/1409360.1409378

中图分类号：

TP3 [计算技术、计算机技术];

学科分类号：

0812 ;

摘要：

Open Information Extraction (IE), where the identities of the relations to be extracted are unknown and the billions of documents found on the web necessitate highly scalable processing, is a reliable way of extracting information from the Internet. The first IE systems relied on some form of pattern-matching rules that were manually crafted for each domain. Modern IE automatically learns an extractor from a training set in which domain-specific examples are tagged. The development of suitable training data for IE requires substantial effort and expertise. The Know-ItAll web IE system automates IE by learning to label its own training examples using only a small set of domain-independent extraction patterns. TextRunner is a fully implemented Open IE system that utilizes the two-phase architecture. It's first phase uses a general model of language, which trains a graphical model called a conditional random field (CRF). Open IE also supports aggregating, fusing information across a large number of web pages.

引用

页码：68 / 74

页数：7

共 24 条

[1]

AGICHTEIN E, 2000, P 5 ACM INT C DIG LI

[2]

ARPA, 1991, P 3 MESS UND C

[3]

Banko M., 2007, P INT JOINT C ART IN

[4]

BANKO M, 2008, P ASS COMP LING

[5]

Brin S, 1999, LECT NOTES COMPUT SC, V1590, P172

[6]

BUNESEU R, 2007, P ASS COMP LING

[7]

DOWNEY D, 2005, P INT JOINT C ART IN

[8]

DOWNEY D, 2007, P ASS COMP LING

[9] Unsupervised named-entity extraction from the Web: An experimental study [J].

Etzioni, O ;

Cafarella, M ;

Downey, D ;

Popescu, AM ;

Shaked, T ;

Soderland, S ;

Weld, DS ;

Yates, A .

ARTIFICIAL INTELLIGENCE, 2005, 165 (01) :91-134

[10]

FELDMAN R, 2006, P INT S METH INT SYS, P755

← 1 2 3 →