Attention and self-attention in random forests

被引:0
作者
Lev V. Utkin
Andrei V. Konstantinov
Stanislav R. Kirpichenko
机构
[1] Peter the Great St.Petersburg Polytechnic University,Higher School of Artificial Intelligence
来源
Progress in Artificial Intelligence | 2023年 / 12卷
关键词
Attention mechanism; Random forest; Nadaraya–Watson regression; Quadratic programming; Linear programming; Contamination model;
D O I
暂无
中图分类号
学科分类号
摘要
New models of random forests jointly using the attention and self-attention mechanisms are proposed for solving the regression problem. The models can be regarded as extensions of the attention-based random forest whose idea stems from applying a combination of the Nadaraya–Watson kernel regression and the Huber’s contamination model to random forests. The self-attention aims to capture dependencies of the tree predictions and to remove noise or anomalous predictions in the random forest. The self-attention module is trained jointly with the attention module for computing weights. It is shown that the training process of attention weights is reduced to solving a single quadratic or linear optimization problem. Three modifications of the self-attention are proposed and compared. A specific multi-head self-attention for the random forest is also considered. Heads of the self-attention are obtained by changing its tuning parameters including the kernel parameters and the contamination parameter of models. The proposed modifications of the attention and self-attention combinations are verified and compared with other random forest models by using several datasets. The code implementing the corresponding algorithms is publicly available.
引用
收藏
页码:257 / 273
页数:16
相关论文
共 55 条
  • [1] Breiman L(2001)Random forests Mach. Learn. 45 5-32
  • [2] Demsar J(2006)Statistical comparisons of classifiers over multiple data sets J. Mach. Learn. Res. 7 1-30
  • [3] Friedman J(2001)Greedy function approximation: a gradient boosting machine Ann. Stat. 29 1189-1232
  • [4] Friedman J(2002)Stochastic gradient boosting Comput. Stat. Data Anal. 38 367-378
  • [5] Geurts P(2006)Extremely randomized trees Mach. Learn. 63 3-42
  • [6] Ernst D(2022)Transformers in vision: a survey ACM Comput. Surv. 54 1-41
  • [7] Wehenkel L(2011)A weight-adjusted voting algorithm for ensemble of classifiers J. Korean Stat. Soc. 40 437-449
  • [8] Khan S(2020)SACNN: Self-attention convolutional neural network for low-dose CT denoising with self-supervised perceptual loss network IEEE Trans. Med. Imaging 39 2289-2301
  • [9] Naseer M(1964)On estimating regression Theory Probab. Appl. 9 141-142
  • [10] Hayat M(2021)A review on the attention mechanism of deep learning Neurocomputing 452 48-62