A novel virtual sample generation method based on Gaussian distribution

被引:124
作者
Yang, Jing [1 ]
Yu, Xu [1 ]
Xie, Zhi-Qiang [1 ,2 ]
Zhang, Jian-Pei [1 ]
机构
[1] Harbin Engn Univ, Coll Comp Sci & Technol, Harbin, Peoples R China
[2] Harbin Univ Sci & Technol, Coll Comp Sci & Technol, Harbin, Peoples R China
基金
中国博士后科学基金; 黑龙江省自然科学基金; 中国国家自然科学基金;
关键词
Virtual sample; Regularization theory; Cost-sensitive learning; Gaussian distribution; Prior knowledge;
D O I
10.1016/j.knosys.2010.12.010
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Traditional machine learning algorithms are not with satisfying generalization ability on noisy, imbalanced, and small sample training set. In this work, a novel virtual sample generation (VSG) method based on Gaussian distribution is proposed. Firstly, the method determines the mean and the standard error of Gaussian distribution. Then, virtual samples can be generated by such Gaussian distribution. Finally, a new training set is constructed by adding the virtual samples to the original training set. This work has shown that training on the new training set is equivalent to a form of regularization regarding small sample problems, or cost-sensitive learning regarding imbalanced sample problems. Experiments show that given a suitable number of virtual sample replicates, the generalization ability of the classifiers on the new training sets can be better than that on the original training sets. (C) 2011 Published by Elsevier B.V.
引用
收藏
页码:740 / 748
页数:9
相关论文
共 28 条
[1]  
AN IG, 1996, NEURAL COMPUT, P643
[2]   Survey and critique of techniques for extracting rules from trained artificial neural networks [J].
Andrews, R ;
Diederich, J ;
Tickle, AB .
KNOWLEDGE-BASED SYSTEMS, 1995, 8 (06) :373-389
[3]  
[Anonymous], CHIN J COMPUT
[4]  
[Anonymous], 1977, Solution of illposed problems
[5]   TRAINING WITH NOISE IS EQUIVALENT TO TIKHONOV REGULARIZATION [J].
BISHOP, CM .
NEURAL COMPUTATION, 1995, 7 (01) :108-116
[6]   A tutorial on Support Vector Machines for pattern recognition [J].
Burges, CJC .
DATA MINING AND KNOWLEDGE DISCOVERY, 1998, 2 (02) :121-167
[7]   SUPPORT-VECTOR NETWORKS [J].
CORTES, C ;
VAPNIK, V .
MACHINE LEARNING, 1995, 20 (03) :273-297
[8]  
DONG CY, 2004, THESIS XIDIAN U CHIN
[9]   A neural network approach for solving linear bilevel programming problem [J].
Hu, Tiesong ;
Guo, Xuning ;
Fu, Xiang ;
Lv, Yibing .
KNOWLEDGE-BASED SYSTEMS, 2010, 23 (03) :239-242
[10]  
Kohavi R, 1995, P 14 INT JOINT C ART, V2, P1137, DOI DOI 10.5555/1643031.1643047