A hybrid generative/discriminative method for semi-supervised classification

被引:32
作者
Jiang, Zhen [1 ]
Zhang, Shiyong [1 ]
Zeng, Jianping [1 ]
机构
[1] Fudan Univ, Sch Comp Sci, Shanghai 200433, Peoples R China
基金
美国国家科学基金会; 中国国家自然科学基金;
关键词
Co-training; Hybrid generative/discriminative methods; Naive Bayes; Support vector machine; Classification; Class imbalance;
D O I
10.1016/j.knosys.2012.07.020
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Training methods for machine learning are often characterized as being generative or discriminative. we present a new co-training style algorithm which employs a generative classifier (Naive Bayes) and a discriminative classifier (Support Vector Machine) as base classifiers, to take advantage of both methods. Furthermore, we introduce a pair of weight parameters to balance the impact of labeled and pseudolabeled data, and define a hybrid objective function to tune their values during co-training. The final prediction is given by the combination of base classifiers, and we define a pseudo-validation set to regulate their weight. Additionally, we present a strategy of pseudo-labeled data selecting to deal with the class imbalance problem. Experimental results on six datasets show that our method performs much better in practice, especially when the amount of labeled data is small. (C) 2012 Elsevier B.V. All rights reserved.
引用
收藏
页码:137 / 145
页数:9
相关论文
共 28 条
[1]  
Agarwal A., 2009, INT JOINT C ART INT
[2]   Applying support vector machines to imbalanced datasets [J].
Akbani, R ;
Kwek, S ;
Japkowicz, N .
MACHINE LEARNING: ECML 2004, PROCEEDINGS, 2004, 3201 :39-50
[3]  
Angluin D., 1988, Machine Learning, V2, P343, DOI 10.1023/A:1022873112823
[4]  
[Anonymous], 2006, BOOK REV IEEE T NEUR
[5]  
[Anonymous], 1997, P 14 INT C MACHINE L
[6]  
[Anonymous], 2006, P 21 NAT C ART INT 1
[7]  
[Anonymous], 2006, PROC 23 INT C MACH L, DOI DOI 10.1145/1143844.1143863
[8]  
[Anonymous], 2002, ADV NEURAL INFORM PR
[9]  
[Anonymous], 2003, Advances in neural information processing systems
[10]   Noise-tolerant learning, the parity problem, and the statistical query model [J].
Blum, A ;
Kalai, A ;
Wasserman, H .
JOURNAL OF THE ACM, 2003, 50 (04) :506-519