Robust Variational Autoencoders for Outlier Detection and Repair of Mixed-Type Data

被引:0
作者
Eduardo, Simao [1 ]
Nazabal, Alfredo [2 ]
Williams, Christopher K. I. [1 ,2 ]
Sutton, Charles [1 ,2 ,3 ]
机构
[1] Univ Edinburgh, Sch Informat, Edinburgh, Midlothian, Scotland
[2] Alan Turing Inst, London, England
[3] Google Res, Mountain View, CA USA
来源
INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND STATISTICS, VOL 108 | 2020年 / 108卷
基金
英国工程与自然科学研究理事会;
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
We focus on the problem of unsupervised cell outlier detection and repair in mixed-type tabular data. Traditional methods are concerned only with detecting which rows in the dataset are outliers. However, identifying which cells are corrupted in a specific row is an important problem in practice, and the very first step towards repairing them. We introduce the Robust Variational Autoencoder (RVAE), a deep generative model that learns the joint distribution of the clean data while identifying the outlier cells, allowing their imputation (repair). RVAE explicitly learns the probability of each cell being an outlier, balancing different likelihood models in the row outlier score, making the method suitable for outlier detection in mixed-type datasets. We show experimentally that not only RVAE performs better than several state-of-the-art methods in cell outlier detection and repair for tabular data, but also that is robust against the initial hyper-parameter selection.
引用
收藏
页码:4056 / 4065
页数:10
相关论文
共 39 条
  • [1] Akrami H., 2019, ROBUST VARIATIONAL A
  • [2] An J., 2015, Special Lecture on IE, V2, P1
  • [3] [Anonymous], 2006, Adv. Neural Inf. Process. Syst
  • [4] [Anonymous], 2018, WWW
  • [5] Bishop Christopher M, 2006, MACH LEARN, V128, DOI DOI 10.1117/1.2819119
  • [6] Robust Principal Component Analysis?
    Candes, Emmanuel J.
    Li, Xiaodong
    Ma, Yi
    Wright, John
    [J]. JOURNAL OF THE ACM, 2011, 58 (03)
  • [7] Chen YQ, 2001, IEEE IMAGE PROC, P34, DOI 10.1109/ICIP.2001.958946
  • [8] Emmott Andrew, 2015, ARXIV150301158
  • [9] The PASCAL Visual Object Classes Challenge: A Retrospective
    Everingham, Mark
    Eslami, S. M. Ali
    Van Gool, Luc
    Williams, Christopher K. I.
    Winn, John
    Zisserman, Andrew
    [J]. INTERNATIONAL JOURNAL OF COMPUTER VISION, 2015, 111 (01) : 98 - 136
  • [10] Francis Gales Mark John., 1999, EUROSPEECH