IceBerg: Deep Generative Modeling for Constraint Discovery and Anomaly Detection

被引:1
作者
Hu, Wentao [1 ]
Jiang, Dawei [1 ]
Wu, Sai [1 ]
Chen, Ke [1 ]
Chen, Gang [1 ]
机构
[1] Zhejiang Univ, Key Lab Big Data Intelligent Comp Zhejiang Prov, Hangzhou, Zhejiang, Peoples R China
来源
2022 IEEE INTL CONF ON PARALLEL & DISTRIBUTED PROCESSING WITH APPLICATIONS, BIG DATA & CLOUD COMPUTING, SUSTAINABLE COMPUTING & COMMUNICATIONS, SOCIAL COMPUTING & NETWORKING, ISPA/BDCLOUD/SOCIALCOM/SUSTAINCOM | 2022年
基金
中国国家自然科学基金;
关键词
Deep Generative Modeling; Constraint Discovery; Intelligent Auditing; Anomaly Detection; Databases;
D O I
10.1109/ISPA-BDCloud-SocialCom-SustainCom57177.2022.00017
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Automatic constraint discovery from a relational database is beneficial for domain experts in fraud detection and intelligent auditing. Its objective is to discover a set of inherent constraints underlying the database such that tuples violating them are considered anomalous. In this paper, we propose IceBerg as the first system to simultaneously detect anomalous tuples and discover the associated human-readable constraints. The backbone of IceBerg is a novel generative network, namely KD-VAE, that integrates Kernel Density estimation with Variational AutoEncoder. KD-VAE is expected to learn the distributions of normal tuples. We can perform anomalous data detection by calculating the likelihood that the tuple fits the distributions of normal tuples and abnormality interpretation by comparing the detected anomalous tuples with their generated normal counterparts.We empirically compare the proposed method with several state-of-the-art outlier detection methods on 13 real-world datasets. The results show that IceBerg outperforms its competitors in most cases, especially for complex datasets with high-dimensional features.
引用
收藏
页码:74 / 81
页数:8
相关论文
共 33 条
[1]  
Abiteboul S., 2003, Proceedings of the 29th international conference on Very large data bases-Volume 29, VLDB '2003, P668, DOI 10.1016/B978-012722442-8/50065-3
[2]   A survey of anomaly detection techniques in financial domain [J].
Ahmed, Mohiuddin ;
Mahmood, Abdun Naser ;
Islam, Md. Rafiqul .
FUTURE GENERATION COMPUTER SYSTEMS-THE INTERNATIONAL JOURNAL OF ESCIENCE, 2016, 55 :278-288
[3]  
An J., 2015, Spec. Lect. IE, V2, P1
[4]  
Aytekin C, 2018, IEEE IJCNN
[5]   Efficient Denial Constraint Discovery with Hydra [J].
Bleifuss, Tobias ;
Kruse, Sebastian ;
Naumann, Felix .
PROCEEDINGS OF THE VLDB ENDOWMENT, 2017, 11 (03) :311-323
[6]   LOF: Identifying density-based local outliers [J].
Breunig, MM ;
Kriegel, HP ;
Ng, RT ;
Sander, J .
SIGMOD RECORD, 2000, 29 (02) :93-104
[7]   Human-in-the-loop Outlier Detection [J].
Chai, Chengliang ;
Cao, Lei ;
Li, Guoliang ;
Li, Jian ;
Luo, Yuyu ;
Madden, Samuel .
SIGMOD'20: PROCEEDINGS OF THE 2020 ACM SIGMOD INTERNATIONAL CONFERENCE ON MANAGEMENT OF DATA, 2020, :19-33
[8]  
Chung Junyoung, 2014, ARXIV
[9]   DBSCOUT: A Density-based Method for Scalable Outlier Detection in Very Large Datasets [J].
Corain, Matteo ;
Garza, Paolo ;
Asudeh, Abolfazl .
2021 IEEE 37TH INTERNATIONAL CONFERENCE ON DATA ENGINEERING (ICDE 2021), 2021, :37-48
[10]  
Davis Jason V., 2007, P 24 INT C MACH LEAR, P209, DOI [10.1145/1273496.1273523, DOI 10.1145/1273496.1273523]