Adaptivity in continuous massively parallel distance-based outlier detection

被引:1
作者
Toliopoulos, Theodoros [1 ]
Gounaris, Anastasios [1 ]
机构
[1] Aristotle Univ Thessaloniki, Thessaloniki, Greece
基金
欧盟地平线“2020”;
关键词
Adaptive outlier detection; Dynamic partitioning; Massively parallel processing; Flink; STATE MANAGEMENT; STREAM; ALGORITHMS;
D O I
10.1007/s00607-022-01101-5
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
We deal with the problem of dynamically allocating the workload to multiple workers in massively parallel continuous distance-based outlier detection, where the workload is conceptually split in contiguous overlapping regions. The main challenges stem from the fact that modern streaming processing frameworks, such as Apache Flink and Spark Streaming, do not support feedback loops, the process is stateful while the adaptations do not result in key redistribution but in modifying the region boundaries associated with each key. These challenges correspond to overlooked issues, which call for novel solutions that we provide in our work. More specifically, firstly, we propose an architecture for allowing such adaptations in Flink. Secondly, we propose specific techniques for adaptive region definition that are applicable to any distance metric. Finally, we conduct thorough experimental evaluation and our results show that our proposal is both efficient and effective even in small finite streams. In addition, our proposal is shown to be insensitive to the exact continuous outlier detection algorithm and outlier query parameters.
引用
收藏
页码:2659 / 2684
页数:26
相关论文
共 37 条
[1]   Prompt: Dynamic Data-Partitioning for Distributed Micro-batch Stream Processing Systems [J].
Abdelhamid, Ahmed S. ;
Mahmood, Ahmed R. ;
Daghistani, Anas ;
Aref, Walid G. .
SIGMOD'20: PROCEEDINGS OF THE 2020 ACM SIGMOD INTERNATIONAL CONFERENCE ON MANAGEMENT OF DATA, 2020, :2455-2469
[2]   AQWA: Adaptive Query-Workload-Aware Partitioning of Big Spatial Data [J].
Aly, Ahmed M. ;
Mahmood, Ahmed R. ;
Hassan, Mohamed S. ;
Aref, Walid G. ;
Ouzzani, Mourad ;
Elmeleegy, Hazem ;
Qadah, Thamir .
PROCEEDINGS OF THE VLDB ENDOWMENT, 2015, 8 (13) :2062-2073
[3]  
Angiulli F., 2007, P 16 ACM C CONFERENC, P811, DOI DOI 10.1145/1321440.1321552
[4]  
[Anonymous], 2006, Proceedings of the International Conference on Very Large Data Bases
[5]  
Balkesen C., 2011, INT WORKSH DAT MAN S
[6]   An empirical evaluation of exact set similarity join techniques using GPUs [J].
Bellas, Christos ;
Gounaris, Anastasios .
INFORMATION SYSTEMS, 2020, 89
[7]  
Brown LE, 2004, ST HEAL T, V107, P711
[8]   Multi-Tactic Distance-based Outlier Detection [J].
Cao, Lei ;
Yan, Yizhou ;
Kuhlman, Caitlin ;
Wang, Qingyang ;
Rundensteiner, Elke A. ;
Eltabakh, Mohamed .
2017 IEEE 33RD INTERNATIONAL CONFERENCE ON DATA ENGINEERING (ICDE 2017), 2017, :959-970
[9]   Sharing-Aware Outlier Analytics over High-Volume Data Streams [J].
Cao, Lei ;
Wang, Jiayuan ;
Rundensteiner, Elke A. .
SIGMOD'16: PROCEEDINGS OF THE 2016 INTERNATIONAL CONFERENCE ON MANAGEMENT OF DATA, 2016, :527-540
[10]  
Cao L, 2014, PROC INT CONF DATA, P76, DOI 10.1109/ICDE.2014.6816641