KPS: a Web information mining algorithm

被引:5
作者
Guan, T [1 ]
Wong, KF
机构
[1] Univ Regina, Dept Comp Sci, Regina, SK S4S 0A2, Canada
[2] Chinese Univ Hong Kong, Dept Syst Engn & Engn Management, Hong Kong, Peoples R China
基金
加拿大自然科学与工程研究理事会;
关键词
information extraction; information retrieval; Web query; Web databases;
D O I
10.1016/S1389-1286(99)00048-1
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
The Web mostly contains semi-structured information. It is, however, not easy to search and extract structural data hidden in a Web page. Current practices address this problem by (1) syntax analysis (i.e. HTML tags); or (2) wrappers or user-defined declarative languages. The former is only suitable for highly structured Web sites and the latter is time-consuming and offers low scalability. Wrappers could handle tens, but certainly not thousands, of information sources. In this paper, we present a novel information mining algorithm, namely KPS, over semi-structured information on the Web. KPS employs keywords, patterns and/or samples to mine the desired information. Experimental results show that KPS is more efficient than existing Web extracting methods. (C) 1999 Published by Elsevier Science B.V. All rights reserved.
引用
收藏
页码:1495 / 1507
页数:13
相关论文
共 25 条
[1]   Querying documents in object databases [J].
Abiteboul S. ;
Cluet S. ;
Christophides V. ;
Milo T. ;
Moerkotte G. ;
Siméon J. .
International Journal on Digital Libraries, 1997, 1 (1) :5-19
[2]  
Adelberg Brad, 1998, SIGMOD, 1998, P283, DOI [10.1145/276304.276330, DOI 10.1145/276304.276330]
[3]  
ARNIE C, UNPUB STEPS ISSUES K
[4]   WebOQL: Restructuring documents, databases and Webs [J].
Arocena, GO ;
Mendelzon, AO .
14TH INTERNATIONAL CONFERENCE ON DATA ENGINEERING, PROCEEDINGS, 1998, :24-33
[5]  
Ashish N., 1997, SIGMOD Record, V26, P8, DOI 10.1145/271074.271078
[6]  
Atzeni P., 1997, SIGMOD Record, V26, P16, DOI 10.1145/271074.271080
[7]  
BRIN S, 1998, P INT WORKSH WEB DAT
[8]  
Buneman P, 1996, P ACM SIGMOD INT C M, P505
[9]  
CARVEN M, 1998, P AAAI 98, P509
[10]  
COHEN W, 1996, P AAAI WORKSH INT BA