Organizing hidden-Web databases by clustering visible Web documents

被引:0
|
作者
Barbosa, Luciano [1 ]
Freire, Juliana [1 ]
Silva, Altigran [2 ]
机构
[1] Univ Utah, Salt Lake City, UT 84112 USA
[2] Univ Fed Amazonas, Manaus, Amazonas, Brazil
基金
美国国家科学基金会;
关键词
D O I
暂无
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
In this paper we address the problem of organizing hidden-Web databases. Given a heterogeneous set of Web forms that serve as entry points to hidden-Web databases, our goal is to cluster the forms according to the database domains to which they belong. We propose a new clustering approach that models Web forms as a set of hyperlinked objects and considers visible information in the form context-both within and in the neighborhood of forms-as the basis for similarity comparison. Since the clustering is performed over features that can be automatically extracted, the process is scalable. In addition, because it uses a rich set of metadata, our approach is able to handle a wide range of,forms, including content-rich forms that contain multiple attributes, as well as simple keyword-based search inter-faces. An experimental evaluation over real Web data shows that our strategy generates high-quality clusters-measured both in terms of entropy and F-measure. This indicates that our approach provides an effective and general solution to the problem of organizing hidden-Web databases.
引用
收藏
页码:301 / +
页数:2
相关论文
共 50 条
  • [1] Probe, count, and classify: Categorizing hidden-web databases
    Ipeirotis, PG
    Gravano, L
    Sahami, M
    SIGMOD RECORD, 2001, 30 (02) : 67 - 78
  • [2] QProber: A system for automatic classification of hidden-Web databases
    Gravano, L
    Ipeirotis, PG
    Sahami, M
    ACM TRANSACTIONS ON INFORMATION SYSTEMS, 2003, 21 (01) : 1 - 41
  • [3] Hidden-web database exploration
    Gong, Zhiguo
    Zhang, Jingbai
    Liu, Qin
    ISDA 2006: SIXTH INTERNATIONAL CONFERENCE ON INTELLIGENT SYSTEMS DESIGN AND APPLICATIONS, VOL 2, 2006, : 838 - +
  • [4] Relevance-Based Retrieval on Hidden-Web Text Databases without Ranking Support
    Hristidis, Vagelis
    Hu, Yuheng
    Ipeirotis, Panagiotis G.
    IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2011, 23 (10) : 1555 - 1568
  • [5] Automatic hidden-web table interpretation, conceptualization, and semantic annotation
    Tao, Cui
    Embley, David W.
    DATA & KNOWLEDGE ENGINEERING, 2009, 68 (07) : 683 - 703
  • [6] Automatic hidden-web table interpretation by sibling page comparison
    Tao, Cui
    Embley, David W.
    CONCEPTUAL MODELING - ER 2007, PROCEEDINGS, 2007, 4801 : 566 - 581
  • [7] Browser with Clustering of Web Documents
    Tetali, Ravitheja
    Bose, Joy
    Arif, Tasleem
    2013 SECOND INTERNATIONAL CONFERENCE ON ADVANCED COMPUTING, NETWORKING AND SECURITY (ADCONS 2013), 2013, : 164 - 168
  • [8] 民航主题Hidden-Web爬虫的设计与实现
    张校慧
    徐彬
    陈国强
    陈珊
    计算机应用与软件, 2008, (07) : 187 - 189
  • [9] A TNATS approach to hidden web documents
    Hedley, Yih-Ling
    Younas, Muhammad
    James, Anne
    Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 2004, 3347 : 158 - 167
  • [10] A TNATS approach to hidden web documents
    Hedley, YL
    Younas, M
    James, A
    DISTRIBUTED COMPUTING AND INTERNET TECHNOLOGY, PROCEEDINGS, 2004, 3347 : 158 - 167