Constrained Clustering with Seeds and Term Weighting Scheme

被引：0

作者：

Buatoom, Uraiwan ^{[1
]}

Kongprawechnon, Waree ^{[1
]}

Theeramunkong, Thanaruk ^{[1
,2
]}

机构：

[1] Thammasat Univ, Sirindhorn Int Inst Technol, Pathum Thani, Thailand

[2] Royal Soc Thailand, Bangkok, Thailand

来源：

2018 THIRTEENTH INTERNATIONAL CONFERENCE ON KNOWLEDGE, INFORMATION AND CREATIVITY SUPPORT SYSTEMS (KICSS) | 2018年

关键词：

Semi-supervised; Term weighting; Distribution class; Ambiguity class and Seeded k-means; SEMI-SUPERVISED CLASSIFICATION;

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

While traditional unsupervised learning is blind and the performance relies on the choice of initial seeds. The idea of constrained clustering can use a small number of labeled instances to partly guide a large number of unlabeled instances. It focuses on a set of predefined classes with an aim is to increase the performance of supervised and unsupervised learning using constraints. This paper proposes a new idea of semi-supervised learning based on particularly seeded constrained clustering, where the clustering guidance comes from the statistics of a small set of labeled data. In contrast with existing approaches in seeded K-Means where the labeled instances are specified. However, the proposed work investigates how weighting obtained from a training set affects the seeded-clustering results. Experimental results are demonstrated on three groups of term-weighting statistics; in-collection, intra-class, and inter-class based on frequencies/distributions and an ambiguity class pass entropy value. Text datasets is studied in our experiment. The result also depicts that the term weighting scheme is a potential mean to control/guide the initial and clustering process over a standard normal term weighting scheme.

引用

页码：99 / 104

页数：6

共 14 条

[1]

[Anonymous], 2001, ICML

[2]

[Anonymous], 2002, ICML

[3]

Davidson I, 2005, LECT NOTES ARTIF INT, V3721, P59

[4]

Davidson I, 2006, LECT NOTES ARTIF INT, V4213, P115

[5] Semi-supervised classification method through oversampling and common hidden space [J].

Dong, Aimei ;

Chung, Fu-lai ;

Wang, Shitong .

INFORMATION SCIENCES, 2016, 349 :216-228

[6]

George A, 2013, INT ARAB J INF TECHN, V10, P467

[7] An initial seed selection algorithm for k-means clustering of georeferenced data to improve replicability of cluster assignments for mapping application [J].

Khan, Fouad .

APPLIED SOFT COMPUTING, 2012, 12 (11) :3698-3700

[8]

Klein D, 2002, TECH REP

[9] Effect of term distributions on centroid-based text categorization [J].

Lertnattee, V ;

Theeramunkong, T .

INFORMATION SCIENCES, 2004, 158 :89-115

[10]

Li X., 2015, J BIOINFORM INTELL C, V4, P111

← 1 2 →