A survey on batch training in genetic programming

被引:0
作者
Rosenfeld, Liah [1 ]
Vanneschi, Leonardo [1 ]
机构
[1] Univ Nova Lisboa, NOVA Informat Management Sch NOVA IMS, Campus Campolide, P-1070312 Lisbon, Portugal
关键词
Genetic programming; Batch training; Sampling methods; Generalization; Overfitting; DATA PARALLELISM;
D O I
10.1007/s10710-024-09501-6
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In Machine Learning (ML), the use of subsets of training data, referred to as batches, rather than the entire dataset, has been extensively researched to reduce computational costs, improve model efficiency, and enhance algorithm generalization. Despite extensive research, a clear definition and consensus on what constitutes batch training have yet to be reached, leading to a fragmented body of literature that could otherwise be seen as different facets of a unified methodology. To address this gap, we propose a theoretical redefinition of batch training, creating a clearer and broader overview that integrates diverse perspectives. We then apply this refined concept specifically to Genetic Programming (GP). Although batch training techniques have been explored in GP, the term itself is seldom used, resulting in ambiguity regarding its application in this area. This review seeks to clarify the existing literature on batch training by presenting a new and practical classification system, which we further explore within the specific context of GP. We also investigate the use of dynamic batch sizes in ML, emphasizing the relatively limited research on dynamic or adaptive batch sizes in GP compared to other ML algorithms. By bringing greater coherence to previously disjointed research efforts, we aim to foster further scientific exploration and development. Our work highlights key considerations for researchers designing batch training applications in GP and offers an in-depth discussion of future research directions, challenges, and opportunities for advancement.
引用
收藏
页数:28
相关论文
共 99 条
[1]   Lexicase Selection in Learning Classifier Systems [J].
Aenugu, Sneha ;
Spector, Lee .
PROCEEDINGS OF THE 2019 GENETIC AND EVOLUTIONARY COMPUTATION CONFERENCE (GECCO'19), 2019, :356-364
[2]   An efficient algorithm for data parallelism based on stochastic optimization [J].
Alnowibet, Khalid Abdulaziz ;
Khan, Imran ;
Sallam, Karam M. ;
Mohamed, Ali Wagdy .
ALEXANDRIA ENGINEERING JOURNAL, 2022, 61 (12) :12005-12017
[3]  
Alsadi M.S., 2019, 2019 2 INT C NEW TRE, P1
[4]  
Alsadi MS, 2019, 2019 2ND INTERNATIONAL CONFERENCE ON NEW TRENDS IN COMPUTING SCIENCES (ICTCS), P393, DOI [10.1109/ictcs.2019.8923046, 10.1145/3311823.3311850]
[5]   Is your dataset big enough? Sample size requirements when using artificial neural networks for discrete choice analysis [J].
Alwosheel, Ahmad ;
van Cranenburgh, Sander ;
Chorus, Caspar G. .
JOURNAL OF CHOICE MODELLING, 2018, 28 :167-182
[6]  
Andre D., 1996, Advances in Genetic Programming, V2, P317
[7]   Accelerated parallel genetic programming tree evaluation with OpenCL [J].
Augusto, Douglas A. ;
Barbosa, Helio J. C. .
JOURNAL OF PARALLEL AND DISTRIBUTED COMPUTING, 2013, 73 (01) :86-100
[8]  
Avron H., 2013, Adv. Neural Inf. Process. Syst., V26
[9]  
Bacardit J, 2004, LECT NOTES COMPUT SC, V3242, P1021
[10]   Speed Benchmarking of Genetic Programming Frameworks [J].
Baeta, Francisco ;
Correia, Joao ;
Martins, Tiago ;
Machado, Penousal .
PROCEEDINGS OF THE 2021 GENETIC AND EVOLUTIONARY COMPUTATION CONFERENCE (GECCO'21), 2021, :768-775