Ontology-based information extraction from the World Wide Web

被引:1
作者
Korst, Jan [1 ]
Geleijnse, Gijs
de Jong, Nick
Verschoor, Michael [1 ]
机构
[1] Eindhoven Univ Technol, Automat ontol driven extract & Struct Informat In, Eindhoven, Netherlands
来源
INTELLIGENT ALGORITHMS IN AMBIENT AND BIOMEDICAL COMPUTING | 2006年 / 7卷
关键词
information extraction; ontology; Google; World Wide Web; famous persons;
D O I
10.1007/1-4020-4995-1_10
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
We study possibilities to automatically extract information from the Internet, by structuring and combining data from web pages. The web pages are found with the use of a search engine and the information is structured by using ontologies. The ontologies are populated with the use of statistical and linguistic techniques. We present the results of a case study that is aimed at finding the names of famous persons. The results indicate that, even if we only use the summaries that Google provides of web pages, the approach results in a high precision and recall for the specific application.
引用
收藏
页码:149 / +
页数:4
相关论文
共 28 条
[1]  
[Anonymous], WIK FREE ENC
[2]  
[Anonymous], 1997, ALGORITHMS STRINGS T, DOI DOI 10.1017/CBO9780511574931
[3]  
BERNERSLEE T, 2001, SCI AM MAY
[4]  
BREEBAART J, 2004, ALGORITHMS AMBIENT I
[5]  
BRIN S, 1998, WEBDB WORKSH EDBT 98
[6]  
BUCHHOLTZ S, 2001, P 10 TEXT RETR C TRE
[7]  
CLERKIN P, 2001, P 1 WORKSH SEM WEB M
[8]   Web mining: Information and pattern discovery on the World Wide Web [J].
Cooley, R ;
Mobasher, B ;
Srivastava, J .
NINTH IEEE INTERNATIONAL CONFERENCE ON TOOLS WITH ARTIFICIAL INTELLIGENCE, PROCEEDINGS, 1997, :558-567
[9]  
FAATZ A, 2002, P 2 WORKSH SEM WEB M
[10]  
FELDMAN A, 1979, SCI INVENTIIONS