A Framework for Extracting Information from Semi-Structured Web Data Sources

被引:7
作者
Shaker, Malunoud [1 ]
Ibrahim, Hamidah [1 ]
Mustapha, Aida [1 ]
Abdullah, Lili Nurliyana [1 ]
机构
[1] Univ Putra Malaysia, Dept Comp Sci, Fac Comp Sci & Informat Technol, Serdang 43400, Malaysia
来源
THIRD 2008 INTERNATIONAL CONFERENCE ON CONVERGENCE AND HYBRID INFORMATION TECHNOLOGY, VOL 1, PROCEEDINGS | 2008年
关键词
D O I
10.1109/ICCIT.2008.60
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Nowadays, many users use web search engines to find and gather information. User faces an increasing amount of various semi-structured information sources. The issue of correlating, integrating and presenting related information to users becomes important. When a user uses a search engine such as Yahoo and Google to seek a specific information, the results are not only information about the availability of the desired information, but also information about other pages on which the desired information is mentioned. The number of selected pages is enormous. Therefore, the performance capabilities, the overlap among results for the same queries and limitations of web search engines are an important and large area of research. Extracting information from the web data sources also becomes very important because the massive and increasing amount of diverse semi-structured information sources in the Internet that are available to users, and the variety of web pages making the process of information extraction from web a challenging problem. This paper proposes a framework for extracting, classifying and browsing semi-structured web data sources. The framework is able to extract relevant information from different web data sources, and classify the extracted information based on the standard classification of Nokia products.
引用
收藏
页码:27 / 31
页数:5
相关论文
共 14 条
[1]  
ARNICANS G, 2006, INTELLIGENT INTEGRAT
[2]  
ASHRAF F, 2007, 21 INT C ADV INF NET
[3]  
BENEVENTANO D, 2004, P 2004 ACM S APPL CO
[4]  
FEI, 2005, INFORM EXTRACTION SY
[5]  
IRMAK U, 2006, INTERACTIVE WRAPPER
[6]  
JUNG SW, 2001, INTELLIGENT INTEGRAT
[7]  
NACHOUKI G, 2006, METHOD INFORM EXTRAC
[8]  
Saggion H., 2007, ONTOLOGY BASED INFOR
[9]  
SHEHATA S, 2007, ENHANCING SEARCH ENG
[10]  
Spink, 2007, DETERMINING USER INT