Formal concept analysis approach for data extraction from a limited deep web database

被引:0
作者
Zhuo Zhang
Juan Du
Liming Wang
机构
[1] ZhengZhou University,School of Information Engineering
[2] Information Technology Engineering,undefined
[3] Yellow River Conservancy Technical Institute,undefined
来源
Journal of Intelligent Information Systems | 2013年 / 41卷
关键词
Algorithms; Formal concept analysis; Lower cover; Data extraction; Limited web database;
D O I
暂无
中图分类号
学科分类号
摘要
Few studies have addressed the problem of extracting data from a limited deep web database. We apply formal concept analysis to this problem and propose a novel algorithm called EdaliwdbFCA. Before a query Y is sent, the algorithm analyzes the local formal context KL, which consists of the latest extracted data, and predicts the size of the query results according to the cardinality of the extent X of the formal concept (X,Y) derived from KL. Thus, it can be determined in advance if Y is a query or not. Candidate query concepts are dynamically generated from the lower cover of the current concept (X,Y). Therefore, this method avoids building of concrete concept lattices during extraction. Moreover, two pruning rules are adopted to reduce redundant queries. Experiments on controlled data sets and real applications were performed. The results confirm that the algorithm theories are correct and it can be effectively applied in the real world.
引用
收藏
页码:211 / 234
页数:23
相关论文
共 43 条
[1]  
Carpineto C(2004)Exploiting the potential of concept lattices for information retrieval with CREDO Journal of Universal Computer Science 10 985-1013
[2]  
Romano G(2011)Robust and efficient annotation based on ontology evolution for deep web data Journal of Computers 6 2029-2036
[3]  
Chen K(2008)Concept similarity and related categories in SearchSleuth Lecture Notes in Computer Science 5113 255-268
[4]  
Zuo W(2012)Semantic ranking of web pages based on formal concept analysis Journal of Systems and Software 86 187-197
[5]  
Zhang F(2011)Data extraction for deep web using wordnet IEEE Transactions on Systems, Man, and Cybernetics, Part C: Applications and Reviews 41 854-868
[6]  
He F(2012)An approach to incremental deep web crawling based on incremental harvest model Procedia Engineering 29 1081-1087
[7]  
Chen Y(2010)Efficient deep web crawling using reinforcement learning Lecture Notes in Computer Science 6118 428-439
[8]  
Dau F(2006)Conceptual knowledge retrieval with FooCA: improving web search engine results with contexts and concept hierarchies Lecture Notes in Computer Science 4065 176-190
[9]  
Ducrou J(2012)E-ffc: an enhanced form-focused crawler for domain-specific deep web databases Journal of Intelligent Information Systems 40 159-184
[10]  
Eklund P(2010)Vide: a vision-based approach for deep web data extraction IEEE Transactions on Knowledge and Data Engineering 22 447-460