On sampling error in genetic programming

被引:7
作者
Schweim, Dirk [1 ]
Wittenberg, David [1 ]
Rothlauf, Franz [1 ]
机构
[1] Johannes Gutenberg Univ Mainz, Jakob Welder Weg 9, D-55128 Mainz, Germany
关键词
Sampling error; Initial supply; Genetic programming; Building blocks; Initial population; Ramped half-and-half; Full; Grow; n-Grams; SCHEMA THEORY; ALGORITHMS; CROSSOVER;
D O I
10.1007/s11047-020-09828-w
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The initial population in genetic programming (GP) should form a representative sample of all possible solutions (the search space). While large populations accurately approximate the distribution of possible solutions, small populations tend to incorporate a sampling error. This paper analyzes how the size of a GP population affects the sampling error and contributes to answering the question of how to size initial GP populations. First, we present a probabilistic model of the expected number of subtrees for GP populations initialized with full, grow, or ramped half-and-half. Second, based on our frequency model, we present a model that estimates the sampling error for a given GP population size. We validate our models empirically and show that, compared to smaller population sizes, our recommended population sizes largely reduce the sampling error of measured fitness values. Increasing the population sizes even more, however, does not considerably reduce the sampling error of fitness values. Last, we recommend population sizes for some widely used benchmark problem instances that result in a low sampling error. A low sampling error at initialization is necessary (but not sufficient) for a reliable search since lowering the sampling error means that the overall random variations in a random sample are reduced. Our results indicate that sampling error is a severe problem for GP, making large initial population sizes necessary to obtain a low sampling error. Our model allows practitioners of GP to determine a minimum initial population size so that the sampling error is lower than a threshold, given a confidence level.
引用
收藏
页码:173 / 186
页数:14
相关论文
共 39 条
  • [1] Analysis of Schema Frequencies in Genetic Programming
    Burlacu, Bogdan
    Affenzeller, Michael
    Kommenda, Michael
    Kronberger, Gabriel
    Winkler, Stephan
    [J]. COMPUTER AIDED SYSTEMS THEORY - EUROCAST 2017, PT I, 2018, 10671 : 432 - 438
  • [2] Building Blocks Identification Based on Subtree Sample Counts for Genetic Programming
    Burlacu, Bogdan
    Kommenda, Michael
    Affenzeller, Michael
    [J]. 2015 ASIA-PACIFIC CONFERENCE ON COMPUTER-AIDED SYSTEM ENGINEERING - APCASE 2015, 2015, : 152 - 157
  • [3] Cochran W.G., 1977, Sampling Techniques
  • [4] De Jong K.A., 1975, An analysis of the bevavior of a class of genetic adaptive systems
  • [5] Fortin FA, 2012, J MACH LEARN RES, V13, P2171
  • [6] Goldberg D. E., 1991, Complex Systems, V5, P265
  • [7] Goldberg D. E., 1992, Complex Systems, V6, P333
  • [8] Goldberg D. E., 1987, Genetic Algorithms and their Applications: Proceedings of the Second International Conference on Genetic Algorithms, P1
  • [9] Goldberg D.E., 1989, GENETIC ALGORITHMS S
  • [10] Goldberg D.E., 2002, DESIGN INNOVATION LE, DOI [10.1007/978-1-4757-3643-4, DOI 10.1007/978-1-4757-3643-4]