Harnessing the Power of GPUs to Speed Up Feature Selection for Outlier Detection

被引:0
作者
Fatemeh Azmandian
Ayse Yilmazer
Jennifer G. Dy
Javed A. Aslam
David R. Kaeli
机构
[1] Northeastern University,Department of Electrical and Computer Engineering
[2] Northeastern University,College of Computer and Information Science
来源
Journal of Computer Science and Technology | 2014年 / 29卷
关键词
feature selection; outlier detection; imbalanced data; GPU acceleration;
D O I
暂无
中图分类号
学科分类号
摘要
Acquiring a set of features that emphasize the differences between normal data points and outliers can drastically facilitate the task of identifying outliers. In our work, we present a novel non-parametric evaluation criterion for filter-based feature selection which has an eye towards the final goal of outlier detection. The proposed method seeks the subset of features that represent the inherent characteristics of the normal dataset while forcing outliers to stand out, making them more easily distinguished by outlier detection algorithms. Experimental results on real datasets show the advantage of our feature selection algorithm compared with popular and state-of-the-art methods. We also show that the proposed algorithm is able to overcome the small sample space problem and perform well on highly imbalanced datasets. Furthermore, due to the highly parallelizable nature of the feature selection, we implement the algorithm on a graphics processing unit (GPU) to gain significant speedup over the serial version. The benefits of the GPU implementation are two-fold, as its performance scales very well in terms of the number of features, as well as the number of data points.
引用
收藏
页码:408 / 422
页数:14
相关论文
共 38 条
  • [1] Schölkopf B(1998)Nonlinear component analysis as a kernel eigenvalue problem Neural Computation 10 1299-1319
  • [2] Smola A(1997)Wrappers for feature subset selection Artificial Intelligence 97 273-324
  • [3] Müller KR(1997)Feature selection for classification Intelligent Data Analysis 1 131-156
  • [4] Kohavi R(2003)An introduction to variable and feature selection J. Machine Learning Research 3 1157-1182
  • [5] John GH(1996)Regression shrinkage and selection via the lasso J. Royal Statistical Society, Series B 58 267-288
  • [6] Dash M(2007)Gene selection via the BAHSIC family of algorithms Bioinformatics 23 i490-i498
  • [7] Liu H(2013)Online feature selection with streaming features IEEE Transactions on Pattern Analysis and Machine Intelligence 35 1178-1192
  • [8] Guyon I(2005)An effective and efficient algorithm for high-dimensional outlier detection The VLDB Journal 14 211-221
  • [9] Elisseeff A(2012)Density-preserving projections for large-scale local anomaly detection Knowledge and Information Systems 32 25-52
  • [10] Tibshirani R(2013)In-network outlier detection in wireless sensor networks Knowledge and Information Systems 34 23-54