Efficient Top-k Retrieval on Massive Data

被引:10
作者
Han, Xixian [1 ]
Li, Jianzhong [2 ]
Gao, Hong [1 ]
机构
[1] Harbin Inst Technol, Sch Comp Sci & Technol, Harbin, Peoples R China
[2] Harbin Inst Technol, Dept Comp Sci & Engn, Harbin, Peoples R China
基金
中国国家自然科学基金;
关键词
Massive data; T2S algorithm; table scan; early termination; selective retrieval; AGGREGATION; ALGORITHMS;
D O I
10.1109/TKDE.2015.2426691
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In many applications, top-k query is an important operation to return a set of interesting points in a potentially huge data space. It is analyzed in this paper that the existing algorithms cannot process top-k query on massive data efficiently. This paper proposes a novel table-scan-based T2S algorithm to efficiently compute top-k results on massive data. T2S first constructs the presorted table, whose tuples are arranged in the order of the round-robin retrieval on the sorted lists. T2S maintains only fixed number of tuples to compute results. The early termination checking for T2S is presented in this paper, along with the analysis of scan depth. The selective retrieval is devised to skip the tuples in the presorted table which are not top-k results. The theoretical analysis proves that selective retrieval can reduce the number of the retrieved tuples significantly. The construction and incremental-update/batch-processing methods for the used structures are proposed in this paper. The extensive experimental results, conducted on synthetic and real-life data sets, show that T2S has a significant advantage over the existing algorithms.
引用
收藏
页码:2687 / 2699
页数:13
相关论文
共 22 条
[1]  
Akbarinia Reza, 2007, INT C VERY LARGE DAT, P495
[2]  
Bast H., 2006, P 32 INT C VERY LARG, P475, DOI [DOI 10.5555/1182635.1164169, 10.5555/1182635.1164169]
[3]  
Chang YC, 2000, SIGMOD RECORD, V29, P391, DOI 10.1145/335191.335433
[4]  
Das G., 2006, Proceedings of the 32nd International Conference on Very Large Data Bases (VLDB'06), P451
[5]   Optimal aggregation algorithms for middleware [J].
Fagin, R ;
Lotem, A ;
Naor, M .
JOURNAL OF COMPUTER AND SYSTEM SCIENCES, 2003, 66 (04) :614-656
[6]  
Fagin Ronald, 2003, P 2003 ACM SIGMOD IN, P301, DOI DOI 10.1145/872757.872795
[7]  
Fernandez R., 2015, PROC BIENNIAL CONF I
[8]   Towards efficient multi-feature queries in heterogeneous environments [J].
Güntzer, U ;
Balke, WT ;
Kiessling, W .
INTERNATIONAL CONFERENCE ON INFORMATION TECHNOLOGY: CODING AND COMPUTING, PROCEEDINGS, 2001, :622-628
[9]  
Guntzer Ulrich., 2000, VLDB J, P419
[10]   Efficient Skyline Computation on Big Data [J].
Han, Xixian ;
Li, Jianzhong ;
Yang, Donghua ;
Wang, Jinbao .
IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2013, 25 (11) :2521-2535