Classification Under Streaming Emerging New Classes: A Solution Using Completely-Random Trees

被引:97
作者
Mu, Xin [1 ]
Ting, Kai Ming [2 ]
Zhou, Zhi-Hua [1 ]
机构
[1] Nanjing Univ, Natl Key Lab Novel Software Technol, Nanjing 210023, Jiangsu, Peoples R China
[2] Federat Univ, Sch Engn & Informat Technol, Ballarat, Vic 3350, Australia
基金
美国国家科学基金会;
关键词
Data stream; emerging new class; ensemble method; anomaly detection; completely-random trees; NOVELTY DETECTION; SUPPORT;
D O I
10.1109/TKDE.2017.2691702
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
This paper investigates an important problem in stream mining, i.e., classification under streaming emerging new classes or SENC. The SENC problem can be decomposed into three subproblems: detecting emerging new classes, classifying known classes, and updating models to integrate each new class as part of known classes. The common approach is to treat it as a classification problem and solve it using either a supervised learner or a semi-supervised learner. We propose an alternative approach by using unsupervised learning as the basis to solve this problem. The proposed method employs completely-random trees which have been shown to work well in unsupervised learning and supervised learning independently in the literature. The completely-random trees are used as a single common core to solve all three subproblems: unsupervised learning, supervised learning, and model update on data streams. We show that the proposed unsupervised-learning-focused method often achieves significantly better outcomes than existing classification-focused methods.
引用
收藏
页码:1605 / 1618
页数:14
相关论文
共 36 条
[1]   AnyNovel: detection of novel concepts in evolving data streams [J].
Abdallah, Zahraa S. ;
Gaber, Mohamed Medhat ;
Srinivasan, Bala ;
Krishnaswamy, Shonali .
EVOLVING SYSTEMS, 2016, 7 (02) :73-93
[2]  
Aggarwal CC, 2014, CH CRC DATA MIN KNOW, P231
[3]   Stream Classification with Recurring and Novel Class Detection using Class-Based Ensemble [J].
Al-Khateeb, Tahseen ;
Masud, Mohammad M. ;
Khan, Latifur ;
Aggarwal, Charu ;
Han, Jiawei ;
Thuraisingham, Bhavani .
12TH IEEE INTERNATIONAL CONFERENCE ON DATA MINING (ICDM 2012), 2012, :31-40
[4]  
Anguita D., 2012, P INT WORKSH AMB ASS, P216
[5]  
[Anonymous], 2003, P 9 ACM SIGKDD INT C
[6]  
[Anonymous], 2007, P ACM MM
[7]  
[Anonymous], P 17 ACM SIGKDD INT
[8]  
Bifet A, 2009, KDD-09: 15TH ACM SIGKDD CONFERENCE ON KNOWLEDGE DISCOVERY AND DATA MINING, P139
[9]   Efficient Active Novel Class Detection for Data Stream Classification [J].
Bouguelia, Mohamed-Rafik ;
Belaid, Yolande ;
Belaid, Abdel .
2014 22ND INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION (ICPR), 2014, :2826-2831
[10]   LOF: Identifying density-based local outliers [J].
Breunig, MM ;
Kriegel, HP ;
Ng, RT ;
Sander, J .
SIGMOD RECORD, 2000, 29 (02) :93-104