Automatic generation of Web mining environments

被引:0
作者
Cibelli, M [1 ]
Costagliola, G [1 ]
机构
[1] Univ Salerno, Dipartimento Matemat & Informat, I-84081 Baronissi, SA, Italy
来源
DATA MINING AND KNOWLEDGE DISCOVERY: THEORY, TOOLS, AND TECHNOLOGY | 1999年 / 3695卷
关键词
web mining; information extraction; web miner generation; domain-specific search engine;
D O I
10.1117/12.339984
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The main problem related to the retrieval of information from the World Wide Web is the enormous number of unstructured documents and resources, i.e., the difficulty of locating and tracking appropriate sources. This paper presents a Web Mining Environment (WME) that it is capable to find, extract and structure information related to a particular domain from web documents using general-purpose indices. The WME architecture includes a Web Engine Filter (WEF) to sort and reduce the answer set returned by a web engine, a Data Source Pre-processor (DSP) that processes html layout cues in order to collect and qualify page segments and an. Heuristic-based Information Extraction System (HIES) to finally retrieve the required data. Furthermore, we present a Web Mining Environment Generator, WMEG, that allows naive users to generate a WME specific to a given domain by providing a set of specifications.
引用
收藏
页码:215 / 225
页数:3
相关论文
共 17 条
[1]  
[Anonymous], INFORMATION SEEKING
[2]  
AZMY A, 1998, SUPERQUERY DATA MINI
[3]  
BALDONADO MQW, 1997, WEB TECHNIQUES, V2, P42
[4]  
BOUTON B, 1996, OFFICIAL NETSCAPE PO, pCH5
[5]  
BRADLEY P, 1996, SEARCH ENG
[6]  
Burke RD, 1997, AI MAG, V18, P57
[7]  
CAHNDRASEKAR R, P RIAO 97 MONTR JUN
[8]  
CARRIERE J, 1997, 6 INT WORLD WID C SA, P701
[9]  
ETZIONI O, 1996, WORLD WIDE WEB QUAGM
[10]  
FELDMAN S, 1997, CHOOSING WEB SEARH S