Incremental semi-supervised learning on streaming data

被引:34
作者
Li, Yanchao [1 ]
Wang, Yongli [1 ]
Liu, Qi [3 ]
Bi, Cheng [2 ]
Jiang, Xiaohui [1 ]
Sun, Shurong [4 ]
机构
[1] Nanjing Univ Sci & Technol, Sch Comp Sci & Engn, Nanjing 210094, Jiangsu, Peoples R China
[2] Univ Edinburgh, Sch Informat, Edinburgh EH8 9A8, Midlothian, Scotland
[3] Edinburgh Napier Univ, Sch Comp, Edinburgh EH10 5DT, Midlothian, Scotland
[4] Zhenjiang Anal InfoTech Ltd, Zhenjiang 212100, Peoples R China
基金
欧盟地平线“2020”; 中国国家自然科学基金;
关键词
Semi-supervised learning; Dynamic feature learning; Streaming data; Classification; REPRESENTATIONS; CLASSIFICATION; FRAMEWORK; ALGORITHM;
D O I
10.1016/j.patcog.2018.11.006
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In streaming data classification, most of the existing methods assume that all arrived evolving data are completely labeled. One challenge is that some applications where only small amount of labeled examples are available for training. Incremental semi-supervised learning algorithms have been proposed for regularizing neural networks by incorporating various side information, such as pairwise constraints or user-provided labels. However, it is hard to put them into practice, especially for non-stationary environments due to the effectiveness and parameter sensitivity of such algorithms, In this paper, we propose a novel incremental semi-supervised learning framework on streaming data. Each layer of model is comprised of a generative network, a discriminant structure and the bridge. The generative network uses dynamic feature learning based on autoencoders to learn generative features from streaming data which has been demonstrated its potential in learning latent feature representations. In addition, the discriminant structure regularizes the network construction via building pairwise similarity and dissimilarity constraints. It is also used for facilitating the parameter learning of the generative network. The network and structure are integrated into a joint learning framework and bridged by enforcing the correlation of their parameters, which balances the flexible incorporation of supervision information and numerical tractability for non-stationary environments as well as explores the intrinsic data structure. Moreover, an efficient algorithm is designed to solve the proposed optimization problem and we also give an ensemble method. Particularly, when multiple layers of model are stacked, the performance is significantly boosted. Finally, to validate the effectiveness of the proposed method, extensive experiments are conducted on synthetic and real-life datasets. The experimental results demonstrate that the performance of the proposed algorithms is superior to some state-of-the-art approaches. (C) 2018 Elsevier Ltd. All rights reserved.
引用
收藏
页码:383 / 396
页数:14
相关论文
共 68 条
[1]  
[Anonymous], 2005, P INT WORKSH ART INT
[2]  
[Anonymous], P 5 INT C LEARN REPR
[3]  
[Anonymous], 2007, IEEE INT C ICML
[4]  
Barreto A., 2012, ADV NEURAL INFORM PR, V25, P1484
[5]  
Belkin M, 2006, J MACH LEARN RES, V7, P2399
[6]  
Bengio P., 2006, Advances in Neural Information Processing Systems 19 (NIPS06), P153, DOI DOI 10.5555/2976456.2976476
[7]  
Bilenko Mikhail, 2004, P 21 INT C MACH LEAR, P11, DOI DOI 10.1145/1015330.1015360
[8]  
Cauwenberghs G, 2001, ADV NEUR IN, V13, P409
[9]   On the generalization ability of on-line learning algorithms [J].
Cesa-Bianchi, N ;
Conconi, A ;
Gentile, C .
IEEE TRANSACTIONS ON INFORMATION THEORY, 2004, 50 (09) :2050-2057
[10]  
Chapelle O., 2009, Semi-Supervised Learning, V20, P542, DOI 10.1109/TNN.2009.2015974