Constrained Semi-Supervised Growing Self-Organizing Map

被引:9
作者
Allahyar, Amin [1 ]
Yazdi, Hadi Sadoghi [1 ,2 ]
Harati, Ahad [1 ]
机构
[1] Ferdowsi Univ Mashhad, Dept Comp Engn, Mashhad, Iran
[2] Ferdowsi Univ Mashhad, Ctr Excellence Soft Comp & Intelligent Informat P, Mashhad, Iran
关键词
Constrained clustering; Online learning; Semi-supervised Self-Organizing Map; Bregman's projection; Metric learning; NETWORK;
D O I
10.1016/j.neucom.2014.06.039
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Semi-supervised clustering tries to surpass the limits of unsupervised clustering using extra information contained in occasional labeled data points. However, providing such labeled samples is not always possible or easy in real world applications. A weaker, yet still very useful option is providing constraints on the unlabeled training samples, which is the focus of the Constrained Semi-Supervised (CSS) clustering. On the other hand, online learning has gained considerable amount of interests in real world problems with massive sample size or streaming behavior, as lack of memory and computational resources seriously restrict the application of the offline and batch methods. However, the existing algorithms for online CSS clustering problem either assumed that the entire dataset is available and added constraints incrementally or considered chunks of constrained data points and applied an offline CSS clustering algorithm. Thus, none of them can be categorized as a genuine online CSS clustering algorithm. In this paper, we propose CS2GS, an online CSS clustering algorithm. CS2GS is constructed by modifying the online learning process of Semi-Supervised Growing Self-Organizing Map, and converting it to an iterative constrained metric learning problem that can be solved using the Bregman's iterative projections. The proposed CS2GS is studied via a series of thorough tests using synthetic and real data including selections from UCI datasets and FEP - a recent bilingual corpus used for sentence aligning stage of machine translation. Experimental results show the effectiveness of CS2GS in online CSS clustering, and prove that indeed, the limits of the system accuracy may be pushed higher using unlabeled samples. (C) 2014 Elsevier B.V. All rights reserved.
引用
收藏
页码:456 / 471
页数:16
相关论文
共 47 条
[1]   Dynamic self-organizing maps with controlled growth for knowledge discovery [J].
Alahakoon, D ;
Halgamuge, SK ;
Srinivasan, B .
IEEE TRANSACTIONS ON NEURAL NETWORKS, 2000, 11 (03) :601-614
[2]  
Alahakoon D, 1998, IEEE SYS MAN CYBERN, P2901
[3]   Controlling the spread of dynamic self-organising maps [J].
Alahakoon, LD .
NEURAL COMPUTING & APPLICATIONS, 2004, 13 (02) :168-174
[4]  
[Anonymous], 2004, ICML
[5]  
[Anonymous], 2009, P ADV NEUR INF PROC
[6]  
[Anonymous], 1970, Mathematical psychology: An elementary introduction
[7]  
[Anonymous], 1991, P 29 ANN M ASS COMP
[8]  
[Anonymous], 2001, INTRO GRAPH THEORY
[9]  
Banerjee A, 2005, J MACH LEARN RES, V6, P1705
[10]  
Basu Sugato, 2003, ICML, P42