Learning over subconcepts: Strategies for 1-class classification

被引:14
作者
Sharma, Shiven [1 ]
Somayaji, Anil [2 ]
Japkowicz, Nathalie [1 ,3 ]
机构
[1] Univ Ottawa, Sch Informat Technol & Engn, Ottawa, ON K1N 6N5, Canada
[2] Carleton Univ, Sch Comp Sci, Ottawa, ON, Canada
[3] Amer Univ, 4400 Massachusetts Ave NW, Washington, DC 20016 USA
基金
加拿大自然科学与工程研究理事会;
关键词
anomaly detection; classification; machine learning; 1-class classification; ENSEMBLES; SYSTEM;
D O I
10.1111/coin.12128
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In machine learning research and application, multiclass classification algorithms reign supreme. Their fundamental property is the reliance on the availability of data from all known categories to induce effective classifiers. Unfortunately, data from so-called real-world domains sometimes do not satisfy this property, and researchers use methods such as sampling to make the data more conducive for classification. However, there are scenarios in which even such explicit methods to rectify distributions fail. In such cases, 1-class classification algorithms become the practical alternative. Unfortunately, domain complexity severely impacts their ability to produce effective classifiers. The work in this article addresses this issue and develops a strategy that allows for 1-class classification over complex domains. In particular, we introduce the notion of learning along the lines of underlying domain concepts; an important source of complexity in domains is the presence of subconcepts, and by learning over them explicitly rather than on the entire domain as a whole, we can produce powerful 1-class classification systems. The level of knowledge regarding these subconcepts will naturally vary by domain, and thus, we develop 3 distinct methodologies that take the amount of domain knowledge available into account. We demonstrate these over 3 real-world domains.
引用
收藏
页码:440 / 467
页数:28
相关论文
共 24 条
[11]   On the k-NN performance in a challenging scenario of imbalance and overlapping [J].
Garcia, V. ;
Mollineda, R. A. ;
Sanchez, J. S. .
PATTERN ANALYSIS AND APPLICATIONS, 2008, 11 (3-4) :269-280
[12]  
Giacinto G, 2003, LECT NOTES COMPUT SC, V2709, P346
[13]   Intrusion detection in computer networks by a modular ensemble of one-class classifiers [J].
Giacinto, Giorgio ;
Perdisci, Roberto ;
Del Rio, Mauro ;
Roli, Fabio .
INFORMATION FUSION, 2008, 9 (01) :69-82
[14]   Learning from Imbalanced Data [J].
He, Haibo ;
Garcia, Edwardo A. .
IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2009, 21 (09) :1263-1284
[15]   Clustering-based ensembles for one-class classification [J].
Krawczyk, Bartosz ;
Wozniak, Michal ;
Cyganek, Boguslaw .
INFORMATION SCIENCES, 2014, 264 :182-195
[16]  
LICHMAN M., 2013, UCI MACHINE LEARNING
[17]  
Lipka N, 2012, SIGIR 2012: PROCEEDINGS OF THE 35TH INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL, P1041, DOI 10.1145/2348283.2348459
[18]  
Prati RC, 2004, LECT NOTES COMPUT SC, V2972, P312
[19]  
Schwenk H., 1995, Advances in Neural Information Processing Systems 7, P991
[20]  
Sharma S., 2012, 2012 CANADIAN AI, P181