Unsupervised Active Learning Based on Hierarchical Graph-Theoretic Clustering

被引：31

作者：

Hu, Weiming ^{[1
]}

Hu, Wei ^{[1
]}

Xie, Nianhua ^{[1
]}

Maybank, Steve ^{[2
]}

机构：

[1] Chinese Acad Sci, Natl Lab Pattern Recognit, Inst Automat, Beijing 100080, Peoples R China

[2] Univ London, Birkbeck Coll, Sch Comp Sci & Informat Syst, London WC1E 7HX, England

来源：

IEEE TRANSACTIONS ON SYSTEMS MAN AND CYBERNETICS PART B-CYBERNETICS | 2009年 / 39卷 / 05期

基金：

美国国家科学基金会;

关键词：

Active learning; dominant-set clustering; image and video classification; network intrusion detection; spectral clustering; COMMITTEE; QUERY;

D O I：

10.1109/TSMCB.2009.2013197

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Most existing active learning approaches are supervised. Supervised active learning has the following problems: inefficiency in dealing with the semantic gap between the distribution of samples in the feature space and their labels, lack of ability in selecting new samples that belong to new categories that have not yet appeared in the training samples, and lack of adaptability to changes in the semantic interpretation of sample categories. To tackle these problems, we propose an unsupervised active learning framework based on hierarchical graph-theoretic clustering. In the framework, two promising graph-theoretic clustering algorithms, namely, dominant-set clustering and spectral clustering, are combined in a hierarchical fashion. Our framework has some advantages, such as ease of implementation, flexibility in architecture, and adaptability to changes in the labeling. Evaluations on data sets for network intrusion detection, image classification, and video classification have demonstrated that our active learning framework can effectively reduce the workload of manual classification while maintaining a high accuracy of automatic classification. It is shown that, overall, our framework outperforms the support-vector-machine-based supervised active learning, particularly in terms of dealing much more efficiently with new samples whose categories have not yet appeared in the training samples.

引用

页码：1147 / 1161

页数：15

共 49 条

[1] Using active learning in intrusion detection [J].

Almgren, M ;

Jonsson, E .

17TH IEEE COMPUTER SECURITY FOUNDATIONS WORKSHOP, PROCEEDINGS, 2004, :88-98

[2]

Angluin D., 1988, Machine Learning, V2, P319, DOI 10.1007/BF00116828

[3]

[Anonymous], 1980, ADV COMPUT

[4]

[Anonymous], 1994, SIGIR

[5] AN ANALYSIS OF SOME GRAPH THEORETICAL CLUSTER TECHNIQUES [J].

AUGUSTSO.JG ;

MINKER, J .

JOURNAL OF THE ACM, 1970, 17 (04) :571-&

[6]

Bach FR, 2004, ADV NEUR IN, V16, P305

[7]

Bengio Y, 2004, ADV NEUR IN, V16, P177

[8]

Campbell I.C.G., 2000, ICML, V1, P111

[9]

Elkan C., 2000, ACMSIGKDD Explor. Newsl, V1, P63, DOI DOI 10.1145/846183.846199

[10]

ESKIN E, 2002, APPL DATA MINING COM, pCH4

← 1 2 3 4 5 →