The attribute-trend-similarity method to improve learning performance for small datasets

被引：58

作者：

Li, Der-Chiang ^{[1
]}

Lin, Wu-Kuo ^{[1
]}

Lin, Liang-Sian ^{[1
]}

Chen, Chien-Chih ^{[1
]}

Huang, Wen-Ting ^{[1
]}

机构：

[1] Natl Cheng Kung Univ, Dept Ind & Informat Management, Tainan, Taiwan

来源：

INTERNATIONAL JOURNAL OF PRODUCTION RESEARCH | 2017年 / 55卷 / 07期

关键词：

small data-sets; trend similarities of attributes; virtual sample generation; triangular membership functions; forecasting accuracies; NEURAL-NETWORKS; VIRTUAL SAMPLES; PILOT RUNS; DIFFUSION; INFORMATION; MODELS; PARAMETERS; EXAMPLES;

D O I：

10.1080/00207543.2016.1213447

中图分类号：

T [工业技术];

学科分类号：

08 ;

摘要：

Small data-set learning problems are attracting more attention because of the short product lifecycles caused by the increasing pressure of global competition. Although statistical approaches and machine learning algorithms are widely applied to extract information from such data, these are basically developed on the assumption that training samples can represent the properties of the whole population. However, as the properties that the training samples contain are limited, the knowledge that the learning algorithms extract may also be deficient. Virtual sample generation approaches, used as a kind of data pretreatment, have proved their effectiveness when handling small data-set problems. By considering the relationships among attributes in the value generation procedure, this research proposes a non-parametric process to learn the trend similarities among attributes, and then uses these to estimate the corresponding ranges that attribute values may be located in when other attribute values are given. The ranges of the attribute values of the virtual samples are then stepwise estimated using the triangular membership functions (MFs) built to represent the attribute sample distributions. In the experiment, two real cases are examined with four modelling tools, including the M5' model tree (M5'), multiple linear regression, support vector regression and back-propagation neural network. The results show that the forecasting accuracies of the four modelling tools are improved when training sets contain virtual samples. In addition, the outcomes of the proposed procedure show significantly smaller predictive errors than those of other approaches.

引用

页码：1898 / 1913

页数：16

共 25 条

[1]

[Anonymous], 1993, INTRO BOOTSTRAP

[2]

Anthony M.:., 1997, Computational Learning Theory

[3] Parameter inference of general nonlinear dynamical models of gene regulatory networks from small and noisy time series [J].