MIX: A Joint Learning Framework for Detecting Both Clustered and Scattered Outliers in Mixed-Type Data

被引:14
作者
Xu, Hongzuo
Wang, Yijie [1 ]
Wang, Yongjun
Wu, Zhiyue [1 ]
机构
[1] Natl Univ Def Technol, Coll Comp, Sci & Technol Parallel & Distributed Proc Lab, Changsha, Peoples R China
来源
2019 19TH IEEE INTERNATIONAL CONFERENCE ON DATA MINING (ICDM 2019) | 2019年
基金
国家教育部科学基金资助; 中国国家自然科学基金;
关键词
Outlier Detection; Mixed-Type Data; Joint Learning; Unsupervised Learning;
D O I
10.1109/ICDM.2019.00182
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Mixed-type data are pervasive in real life, but very limited outlier detection methods are available for these data. Some existing methods handle mixed-type data by feature converting, whereas their performance is downgraded by information loss and noise caused by the transformation. Another kind of approaches separately evaluates outlierness in numerical and categorical features. However, they fail to adequately consider the behaviours of data objects in different feature spaces, often leading to suboptimal results. As for outlier form, both clustered outliers and scattered outliers are contained in many real-world data, but a number of outlier detectors are inherently restricted by their outlier definitions to simultaneously detect both of them. To address these issues, an unsupervised outlier detection method MIX is proposed. MIX constructs a joint learning framework to establish a cooperation mechanism to make separate outlier scoring constantly communicate and sufficiently grasp the behaviours of data objects in another feature space. Specifically, MIX iteratively performs outlier scoring in numerical and categorical space. Each outlier scoring phase can be iteratively and cooperatively enhanced by the prior knowledge given by another feature space. To target both clustered and scattered outliers, the outlier scoring phases capture the essential characteristic of outliers, i.e., evaluating outlierness via the deviation from the normal model. We show that MIX significantly outperforms eight state-of-the-art outlier detectors on twelve real-world datasets and obtains good scalability.
引用
收藏
页码:1408 / 1413
页数:6
相关论文
共 30 条
[1]  
Aggarwal C. C., 2017, OUTLIER ANAL, DOI DOI 10.1007/978-3-319-47578-3
[2]  
[Anonymous], 2012, CIKM. ACM, DOI [10.1145/2396761.2396816, 10.1145/2396761]
[3]  
[Anonymous], 2014, IEEE T PARALLEL DIST
[4]   A practical outlier detection approach for mixed-attribute data [J].
Bouguessa, Mohamed .
EXPERT SYSTEMS WITH APPLICATIONS, 2015, 42 (22) :8637-8649
[5]   LOF: Identifying density-based local outliers [J].
Breunig, MM ;
Kriegel, HP ;
Ng, RT ;
Sander, J .
SIGMOD RECORD, 2000, 29 (02) :93-104
[6]   On the evaluation of unsupervised outlier detection: measures, datasets, and an empirical study [J].
Campos, Guilherme O. ;
Zimek, Arthur ;
Sander, Jorg ;
Campello, Ricardo J. G. B. ;
Micenkova, Barbora ;
Schubert, Erich ;
Assent, Ira ;
Houle, Michael E. .
DATA MINING AND KNOWLEDGE DISCOVERY, 2016, 30 (04) :891-927
[7]  
DAS K, 2007, SIGKDD, P220
[8]   Energy-based anomaly detection for mixed data [J].
Do, Kien ;
Truyen Tran ;
Venkatesh, Svetha .
KNOWLEDGE AND INFORMATION SYSTEMS, 2018, 57 (02) :413-435
[9]  
EirasFranco C., 2019, INFORM SCI
[10]   LOADED: Link-based outlier and anomaly detection in evolving data sets [J].
Ghoting, A ;
Otey, ME ;
Parthasarathy, S .
FOURTH IEEE INTERNATIONAL CONFERENCE ON DATA MINING, PROCEEDINGS, 2004, :387-390