One-class classification - From theory to practice: A case-study in radioactive threat detection

被引:7
作者
Bellinger, Colin [1 ]
Sharma, Shiven [2 ]
Japkowicz, Nathalie [3 ]
机构
[1] Univ Alberta, Comp Sci, Edmonton, AB, Canada
[2] Fluent Solut Inc, Ottawa, ON, Canada
[3] Amer Univ, Washington, DC 20016 USA
关键词
One-class classification; Imbalanced data; Multiple classifier systems; Small disjuncts; Within-class imbalance; ONE-CLASS SVM; SUPPORT;
D O I
10.1016/j.eswa.2018.05.009
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Over the years, the acceptance of machine learning as a valuable tool in the real-world has caused much interest in the research community; this is particularly the case as the field of Big Data is coming into prominence. However, real-world data comes with a myriad of challenges, amongst the most prominent of which is the fact that it can exhibit a high level of imbalance. This can come in the form of both within- and between-class imbalance. While a significant amount of research has been devoted to the impact of within-class imbalance over binary classifiers, very little attention has been given to their impact on one-class classifiers, which are typically used in situations of extreme between-class imbalance. During our collaboration with Health Canada into the identification of anomalous gamma-ray spectra, the issue of within-class imbalance in a one-class classification setting was highly significant. In this setting, the imbalance comes from the fact that the background data that we wish to model is composed of two concepts (background no-rain and rain); the rain sub-concept is rare and corresponds to spectra affected by the presence of water in the environment. In this article, we present our work into developing systems for detecting anomalous gamma-rays that are able to handle both the inherent between-class and within-class imbalance present in the domain. We test and validate our system over data provided to us by Health Canada from three sites across Canada. Our results indicated that oversampling the sub-concept improves the performance of the baseline classifiers and multiple classifier system when measured by the geometric mean of the per-class accuracy. (C) 2018 Elsevier Ltd. All rights reserved.
引用
收藏
页码:223 / 232
页数:10
相关论文
共 42 条
  • [1] Applying support vector machines to imbalanced datasets
    Akbani, R
    Kwek, S
    Japkowicz, N
    [J]. MACHINE LEARNING: ECML 2004, PROCEEDINGS, 2004, 3201 : 39 - 50
  • [2] Combined 5 x 2 cv F test for comparing supervised classification learning algorithms
    Alpaydin, E
    [J]. NEURAL COMPUTATION, 1999, 11 (08) : 1885 - 1892
  • [3] [Anonymous], THESIS
  • [4] [Anonymous], 1999, THESIS
  • [5] [Anonymous], THESIS
  • [6] Batista GE., 2004, ACM SIGKDD EXPL NEWS, V6, P20, DOI [DOI 10.1145/1007730.1007735, 10.1145/1007730.1007735]
  • [7] Bellinger C, 2017, MACH LEARN, V1, P1
  • [8] Bellinger C., 2017, ECML 2017 1 INT WORK
  • [9] Blondel M, 2011, PROCEEDINGS OF THE 34TH INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL (SIGIR'11), P1123
  • [10] LIBSVM: A Library for Support Vector Machines
    Chang, Chih-Chung
    Lin, Chih-Jen
    [J]. ACM TRANSACTIONS ON INTELLIGENT SYSTEMS AND TECHNOLOGY, 2011, 2 (03)