机构:
Mohammed VI Polytech Univ, Ben Guerir, MoroccoMohammed VI Polytech Univ, Ben Guerir, Morocco
Echihabi, Karima
[1
]
Fatourou, Panagiota
论文数: 0引用数: 0
h-index: 0
机构:
Univ Paris Cite, Paris, France
FORTH, Paris, FranceMohammed VI Polytech Univ, Ben Guerir, Morocco
Fatourou, Panagiota
[2
,3
]
Zoumpatianos, Kostas
论文数: 0引用数: 0
h-index: 0
机构:
Snowflake Inc, Bozeman, MT USAMohammed VI Polytech Univ, Ben Guerir, Morocco
Zoumpatianos, Kostas
[4
]
Palpanas, Themis
论文数: 0引用数: 0
h-index: 0
机构:
Univ Paris Cite, Paris, France
IUF, Paris, FranceMohammed VI Polytech Univ, Ben Guerir, Morocco
Palpanas, Themis
[2
,5
]
Benbrahim, Houda
论文数: 0引用数: 0
h-index: 0
机构:
IRDA, Rabat IT Ctr, Rabat, Morocco
ENSIAS, Rabat, MoroccoMohammed VI Polytech Univ, Ben Guerir, Morocco
Benbrahim, Houda
[6
,7
]
机构:
[1] Mohammed VI Polytech Univ, Ben Guerir, Morocco
[2] Univ Paris Cite, Paris, France
[3] FORTH, Paris, France
[4] Snowflake Inc, Bozeman, MT USA
[5] IUF, Paris, France
[6] IRDA, Rabat IT Ctr, Rabat, Morocco
[7] ENSIAS, Rabat, Morocco
来源:
PROCEEDINGS OF THE VLDB ENDOWMENT
|
2022年
/
15卷
/
10期
关键词:
LERNAEAN HYDRA;
D O I:
10.14778/3547305.3547308
中图分类号:
TP [自动化技术、计算机技术];
学科分类号:
0812 ;
摘要:
We propose Hercules, a parallel tree-based technique for exact similarity search on massive disk-based data series collections. We present novel index construction and query answering algorithms that leverage different summarization techniques, carefully schedule costly operations, optimize memory and disk accesses, and exploit the multi-threading and SIMD capabilities of modern hardware to perform CPU-intensive calculations. We demonstrate the superiority and robustness of Hercules with an extensive experimental evaluation against state-of-the-art techniques, using many synthetic and real datasets, and query workloads of varying difficulty. The results show that Hercules performs up to one order of magnitude faster than the best competitor (which is not always the same). Moreover, Hercules is the only index that outperforms the optimized scan on all scenarios, including the hard query workloads on disk-based datasets.