Approximate sorting and its applications in I/O model

被引:0
作者
Gao, Tianpeng [1 ,2 ]
Li, Jianzhong [1 ,2 ]
机构
[1] Harbin Inst Technol, Harbin, Peoples R China
[2] Chinese Acad Sci, Shenzhen Inst Adv Technol, Shenzhen, Peoples R China
基金
中国国家自然科学基金;
关键词
Approximate sorting; I/O model; External metric; Spearman's footrule metric; Index; Query; Sort-merge join;
D O I
10.1016/j.tcs.2023.114348
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
Sorting is a fundamental component in many different applications including web indexing engines, geographic information systems, data mining and database systems. However, the existing optimal algorithms still cause huge costs when the input data becomes very large. Thus, the approximate sorting for big data is considered in this paper. The goal of approximate sorting for big data is to generate an approximate sorted result, but using less CPU and I/O cost. For big data, we consider the approximate sorting and applications for approximate sorted results in I/O model. The quality of approximate sorting results is usually measured by the distance metrics on permutation space. However, the existing metrics on permutation space are not available for external approximate sorting algorithms. Thus, we propose a new kind of metric named External metric, which ignores the errors and dislocations that happened in each I/O block. The External metric of Spearman's footrule metric is named as External Spearman's footrule metric (short as ESP metric), which is analyzed in this paper. Furthermore, to facilitate a better evaluation of the approximate sorted result, we propose a new metric, named as errors, which directly states the number of dislocation of the elements. The External errors are also considered in this paper. Then, according to the rate -distortion relationship under these two metrics, the lower bound of these two metrics on external approximate sorting problem with I/O operations is proved. We propose a k -passes external approximate sorting algorithm, named as EASORT, and prove that EASORT is asymptotically optimal under ESP metric. Finally, we consider the applications on approximate sorting results. An index for the approximate sorted result is proposed. The single and range query on approximate sorted result using this index are also analyzed. Further, the sort -merge join on two relations, where one of the relations is approximate sorted or both relations are approximate sorted, are all discussed in this paper.
引用
收藏
页数:19
相关论文
共 22 条
  • [1] THE INPUT OUTPUT COMPLEXITY OF SORTING AND RELATED PROBLEMS
    AGGARWAL, A
    VITTER, JS
    [J]. COMMUNICATIONS OF THE ACM, 1988, 31 (09) : 1116 - 1127
  • [2] BF-Tree: Approximate Tree Indexing
    Athanassoulis, Manos
    Ailamaki, Anastasia
    [J]. PROCEEDINGS OF THE VLDB ENDOWMENT, 2014, 7 (14): : 1881 - 1892
  • [3] Ben-Moshe Sagi, 2011, P 14 INT C DAT THEOR, P256
  • [4] SPACE/TIME TRADE/OFFS IN HASH CODING WITH ALLOWABLE ERRORS
    BLOOM, BH
    [J]. COMMUNICATIONS OF THE ACM, 1970, 13 (07) : 422 - &
  • [5] Battery-Free Wireless Sensor Networks: A Comprehensive Survey
    Cai, Zhipeng
    Chen, Quan
    Shi, Tuo
    Zhu, Tongxin
    Chen, Kunyi
    Li, Yingshu
    [J]. IEEE INTERNET OF THINGS JOURNAL, 2023, 10 (06) : 5543 - 5570
  • [6] Distributed Query Processing in the Edge-Assisted IoT Data Monitoring System
    Cai, Zhipeng
    Shi, Tuo
    [J]. IEEE INTERNET OF THINGS JOURNAL, 2021, 8 (16) : 12679 - 12693
  • [7] SPEARMANS FOOTRULE AS A MEASURE OF DISARRAY
    DIACONIS, P
    GRAHAM, RL
    [J]. JOURNAL OF THE ROYAL STATISTICAL SOCIETY SERIES B-METHODOLOGICAL, 1977, 39 (02): : 262 - 268
  • [8] Disser Y, 2017, Arxiv, DOI arXiv:1702.05932
  • [9] RIGHT INVARIANT METRICS AND MEASURES OF PRESORTEDNESS
    ESTIVILLCASTRO, V
    MANNILA, H
    WOOD, D
    [J]. DISCRETE APPLIED MATHEMATICS, 1993, 42 (01) : 1 - 16
  • [10] Farnoud F, 2014, Arxiv, DOI arXiv:1401.3093