Provisional reporting for rank joins

被引：5

作者：

Abid, Adnan ^{[1
,2
]}

Tagliasacchi, Marco ^{[1
]}

机构：

[1] Politecn Milan, Dipartimento Elettron & Informaz, I-20133 Milan, Italy

[2] Univ Cent Punjab, Fac Informat Technol, Lahore, Pakistan

来源：

JOURNAL OF INTELLIGENT INFORMATION SYSTEMS | 2013年 / 40卷 / 03期

基金：

欧洲研究理事会;

关键词：

Rank joins; Top-K queries; Probabilistic reporting; Ranking in web services; TOP-K QUERIES;

D O I：

10.1007/s10844-012-0234-3

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Rank join operators perform a relational join among two or more relations, assign numeric scores to the join results based on a given scoring function, and return K join results with the highest scores, while accessing a subset of data from the input relations. Most of the rank join operators compute a score upper bound for a join result that can be potentially obtained after retrieving the unseen data. A join result is kept in an output buffer, and is deterministically reported to the user if its score is greater than or equal to the score upper bound. The value of the score upper bound decreases subject to further data extraction from the relations. In case of Web services as data sources, which are characterized by non-negligible response time for every data fetch, the value of score upper bound might decrease slowly. Consequently, there is a long delay in reporting a join result stored in the output buffer. This paper addresses the problem of efficiently reporting a top join result obtained by joining the data of two Web services, which are characterized by non-negligible response time. We present a probabilistic reporting method which computes the confidence with which a join result may appear among final top-K joins. It reports a join result as soon as the measure of its confidence exceeds a given threshold. This helps in reporting a join result soon after its observation. An extensive experimental study with various settings of different operating parameters validates the importance of the proposed approach on both real and synthetic data sets. The results show that our proposed approach significantly reduces the average difference between the time when a join result is observed and the time when it is reported, while incurring negligible errors in the final results.

引用

页码：479 / 500

页数：22

共 25 条

[1]

Abid A, 2011, LECT NOTES COMPUT SC, V6757, P44, DOI 10.1007/978-3-642-22233-7_4

[2]

[Anonymous], 2007, P IEEE INT C DAT ENG

[3]

[Anonymous], 2003, Proc. of the ACM SIGMOD International Conference on Management of Data, DOI DOI 10.1145/872757

[4]

Arai B., 2007, VLDB, P914

[5] Anytime measures for top-k algorithms on exact and fuzzy data sets [J].

Arai, Benjamin ;

Das, Gautam ;

Gunopulos, Dimitrios ;

Koudas, Nick .

VLDB JOURNAL, 2009, 18 (02) :407-427

[6] THE MANAGEMENT OF PROBABILISTIC DATA [J].

BARBARA, D ;

GARCIAMOLINA, H ;

PORTER, D .

IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 1992, 4 (05) :487-502

[7] Mashing up search services [J].

Braga, Daniele ;

Ceri, Stefano ;

Martinenghi, Davide ;

Daniel, Florian .

IEEE INTERNET COMPUTING, 2008, 12 (05) :16-23

[8] Evaluating Top-k queries over web-accessible Databases [J].

Bruno, N ;

Gravano, L ;

Marian, A .

18TH INTERNATIONAL CONFERENCE ON DATA ENGINEERING, PROCEEDINGS, 2002, :369-+

[9] Top-k selection queries over relational databases:: Mapping strategies and performance evaluation [J].

Bruno, N ;

Chaudhuri, S ;

Gravano, L .

ACM TRANSACTIONS ON DATABASE SYSTEMS, 2002, 27 (02) :153-187

[10]

Ceri S., 2010, LECT NOTES COMPUTER

← 1 2 3 →