Constrained Clustering with Seeds and Term Weighting Scheme

被引:0
作者
Buatoom, Uraiwan [1 ]
Kongprawechnon, Waree [1 ]
Theeramunkong, Thanaruk [1 ,2 ]
机构
[1] Thammasat Univ, Sirindhorn Int Inst Technol, Pathum Thani, Thailand
[2] Royal Soc Thailand, Bangkok, Thailand
来源
2018 THIRTEENTH INTERNATIONAL CONFERENCE ON KNOWLEDGE, INFORMATION AND CREATIVITY SUPPORT SYSTEMS (KICSS) | 2018年
关键词
Semi-supervised; Term weighting; Distribution class; Ambiguity class and Seeded k-means; SEMI-SUPERVISED CLASSIFICATION;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
While traditional unsupervised learning is blind and the performance relies on the choice of initial seeds. The idea of constrained clustering can use a small number of labeled instances to partly guide a large number of unlabeled instances. It focuses on a set of predefined classes with an aim is to increase the performance of supervised and unsupervised learning using constraints. This paper proposes a new idea of semi-supervised learning based on particularly seeded constrained clustering, where the clustering guidance comes from the statistics of a small set of labeled data. In contrast with existing approaches in seeded K-Means where the labeled instances are specified. However, the proposed work investigates how weighting obtained from a training set affects the seeded-clustering results. Experimental results are demonstrated on three groups of term-weighting statistics; in-collection, intra-class, and inter-class based on frequencies/distributions and an ambiguity class pass entropy value. Text datasets is studied in our experiment. The result also depicts that the term weighting scheme is a potential mean to control/guide the initial and clustering process over a standard normal term weighting scheme.
引用
收藏
页码:99 / 104
页数:6
相关论文
共 14 条