The attribute-trend-similarity method to improve learning performance for small datasets

被引:58
作者
Li, Der-Chiang [1 ]
Lin, Wu-Kuo [1 ]
Lin, Liang-Sian [1 ]
Chen, Chien-Chih [1 ]
Huang, Wen-Ting [1 ]
机构
[1] Natl Cheng Kung Univ, Dept Ind & Informat Management, Tainan, Taiwan
关键词
small data-sets; trend similarities of attributes; virtual sample generation; triangular membership functions; forecasting accuracies; NEURAL-NETWORKS; VIRTUAL SAMPLES; PILOT RUNS; DIFFUSION; INFORMATION; MODELS; PARAMETERS; EXAMPLES;
D O I
10.1080/00207543.2016.1213447
中图分类号
T [工业技术];
学科分类号
08 ;
摘要
Small data-set learning problems are attracting more attention because of the short product lifecycles caused by the increasing pressure of global competition. Although statistical approaches and machine learning algorithms are widely applied to extract information from such data, these are basically developed on the assumption that training samples can represent the properties of the whole population. However, as the properties that the training samples contain are limited, the knowledge that the learning algorithms extract may also be deficient. Virtual sample generation approaches, used as a kind of data pretreatment, have proved their effectiveness when handling small data-set problems. By considering the relationships among attributes in the value generation procedure, this research proposes a non-parametric process to learn the trend similarities among attributes, and then uses these to estimate the corresponding ranges that attribute values may be located in when other attribute values are given. The ranges of the attribute values of the virtual samples are then stepwise estimated using the triangular membership functions (MFs) built to represent the attribute sample distributions. In the experiment, two real cases are examined with four modelling tools, including the M5' model tree (M5'), multiple linear regression, support vector regression and back-propagation neural network. The results show that the forecasting accuracies of the four modelling tools are improved when training sets contain virtual samples. In addition, the outcomes of the proposed procedure show significantly smaller predictive errors than those of other approaches.
引用
收藏
页码:1898 / 1913
页数:16
相关论文
共 25 条
[1]  
[Anonymous], 1993, INTRO BOOTSTRAP
[2]  
Anthony M.:., 1997, Computational Learning Theory
[3]   Parameter inference of general nonlinear dynamical models of gene regulatory networks from small and noisy time series [J].
Berrones, Arturo ;
Jimenez, Edgar ;
Aracelia Alcorta-Garcia, Maria ;
Almaguer, F-Javier ;
Pena, Brenda .
NEUROCOMPUTING, 2016, 175 :555-563
[4]  
CHONGFU H, 1997, FUZZY SETS SYSTEMS, V91, P69, DOI DOI 10.1016/S0165-0114(96)00257-6
[5]   Learning from examples in the small sample case: Face expression recognition [J].
Guo, GD ;
Dyer, CR .
IEEE TRANSACTIONS ON SYSTEMS MAN AND CYBERNETICS PART B-CYBERNETICS, 2005, 35 (03) :477-488
[6]   A diffusion-neural-network for learning from small samples [J].
Huang, CF ;
Moraga, C .
INTERNATIONAL JOURNAL OF APPROXIMATE REASONING, 2004, 35 (02) :137-161
[7]   A framework of model validation and virtual product qualification with limited experimental data based on statistical inference [J].
Jung, Byung C. ;
Park, Jungho ;
Oh, Hyunseok ;
Kim, Jisun ;
Youn, Byeng D. .
STRUCTURAL AND MULTIDISCIPLINARY OPTIMIZATION, 2015, 51 (03) :573-583
[8]   Simulation metamodel development using uniform design and neural networks for automated material handling systems in semiconductor wafer fabrication [J].
Kuo, Yiyo ;
Yang, Taho ;
Peters, Brett A. ;
Chang, Ihui .
SIMULATION MODELLING PRACTICE AND THEORY, 2007, 15 (08) :1002-1015
[9]   Process modeling with neural networks using small experimental datasets [J].
Lanouette, R ;
Thibault, J ;
Valade, JL .
COMPUTERS & CHEMICAL ENGINEERING, 1999, 23 (09) :1167-1176
[10]   Using mega-trend-diffusion and artificial samples in small data set learning for early flexible manufacturing system scheduling knowledge [J].
Li, Der-Chiang ;
Wu, Chih-Sen ;
Tsai, Tung-I ;
Lina, Yao-San .
COMPUTERS & OPERATIONS RESEARCH, 2007, 34 (04) :966-982