Top-k answers for XML keyword queries

被引:4
作者
Khanh Nguyen [1 ]
Cao, Jinli [1 ]
机构
[1] La Trobe Univ, Dept Comp Sci & Comp Engn, Melbourne, Vic 3086, Australia
来源
WORLD WIDE WEB-INTERNET AND WEB INFORMATION SYSTEMS | 2012年 / 15卷 / 5-6期
关键词
XML data; keyword search; top-k answers; mutual information; skyline queries; LOWEST COMMON ANCESTORS; ALGORITHMS; SKYLINE; SEARCH;
D O I
10.1007/s11280-011-0145-1
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Searching XML data using keyword queries has attracted much attention because it enables Web users to easily access XML data without having to learn a structured query language or study possibly complex data schemas. Most of the current approaches identify the meaningful results of a given keyword query based on the semantics of lowest common ancestor (LCA) and its variants. However, given the fact that LCA candidates are usually numerous and of low relevance to the users' information need, how to effectively and efficiently identify the most relevant results from a large number of LCA candidates is still a challenging and unresolved issue. In this article, we introduce a novel semantics of relevant results based on mutual information between the query keywords. Then, we introduce a novel approach for identifying the relevant answers of a given query by adopting skyline semantics. We also recommend three different ranking criteria for selecting the top-k relevant results of the query. Efficient algorithms are proposed which rely on some provable properties of the dominance relationship between result candidates to rapidly identify the top-k dominant results. Extensive experiments were conducted to evaluate our approach and the results show that the proposed approach has a good performance compared with other existing approaches in different data sets and evaluation metrics.
引用
收藏
页码:485 / 515
页数:31
相关论文
共 34 条
[1]  
[Anonymous], 2003, Proceedings of the 2003 ACM SIGMOD international conference on Management of data
[2]  
Baeza-Yates R, 1999, MODERN INFORM RETRIE, V463
[3]   Efficient Sort-Based Skyline Evaluation [J].
Bartolini, Ilaria ;
Ciaccia, Paolo ;
Patella, Marco .
ACM TRANSACTIONS ON DATABASE SYSTEMS, 2008, 33 (04)
[4]   Lowest common ancestors in trees and directed acyclic graphs [J].
Bender, MA ;
Farach-Colton, M ;
Pemmasani, G ;
Skiena, S ;
Sumazin, P .
JOURNAL OF ALGORITHMS-COGNITION INFORMATICS AND LOGIC, 2005, 57 (02) :75-94
[5]   The Skyline operator [J].
Börzsönyi, S ;
Kossmann, D ;
Stocker, K .
17TH INTERNATIONAL CONFERENCE ON DATA ENGINEERING, PROCEEDINGS, 2001, :421-430
[6]   Skyline with presorting [J].
Chomicki, J ;
Godfrey, P ;
Gryz, J ;
Liang, DM .
19TH INTERNATIONAL CONFERENCE ON DATA ENGINEERING, PROCEEDINGS, 2003, :717-719
[7]  
Cohen S., 2005, P 14 ACM INT C INFOR, P389
[8]  
Cohen S., 2003, P VERY LARGE DATA BA, P45
[9]  
Cover T.M., 2006, ELEMENTS INFORM THEO, V2nd ed
[10]   Optimal aggregation algorithms for middleware [J].
Fagin, R ;
Lotem, A ;
Naor, M .
JOURNAL OF COMPUTER AND SYSTEM SCIENCES, 2003, 66 (04) :614-656