Attention and self-attention in random forests

被引：0

作者：

Lev V. Utkin

Andrei V. Konstantinov

Stanislav R. Kirpichenko

机构：

[1] Peter the Great St.Petersburg Polytechnic University,Higher School of Artificial Intelligence

来源：

Progress in Artificial Intelligence | 2023年 / 12卷

关键词：

Attention mechanism; Random forest; Nadaraya–Watson regression; Quadratic programming; Linear programming; Contamination model;

D O I：

暂无

中图分类号：

学科分类号：

摘要：

New models of random forests jointly using the attention and self-attention mechanisms are proposed for solving the regression problem. The models can be regarded as extensions of the attention-based random forest whose idea stems from applying a combination of the Nadaraya–Watson kernel regression and the Huber’s contamination model to random forests. The self-attention aims to capture dependencies of the tree predictions and to remove noise or anomalous predictions in the random forest. The self-attention module is trained jointly with the attention module for computing weights. It is shown that the training process of attention weights is reduced to solving a single quadratic or linear optimization problem. Three modifications of the self-attention are proposed and compared. A specific multi-head self-attention for the random forest is also considered. Heads of the self-attention are obtained by changing its tuning parameters including the kernel parameters and the contamination parameter of models. The proposed modifications of the attention and self-attention combinations are verified and compared with other random forest models by using several datasets. The code implementing the corresponding algorithms is publicly available.

引用

页码：257 / 273

页数：16

共 55 条

[1] Breiman L(2001)Random forests Mach. Learn. 45 5-32
[2] Demsar J(2006)Statistical comparisons of classifiers over multiple data sets J. Mach. Learn. Res. 7 1-30
[3] Friedman J(2001)Greedy function approximation: a gradient boosting machine Ann. Stat. 29 1189-1232
[4] Friedman J(2002)Stochastic gradient boosting Comput. Stat. Data Anal. 38 367-378
[5] Geurts P(2006)Extremely randomized trees Mach. Learn. 63 3-42
[6] Ernst D(2022)Transformers in vision: a survey ACM Comput. Surv. 54 1-41
[7] Wehenkel L(2011)A weight-adjusted voting algorithm for ensemble of classifiers J. Korean Stat. Soc. 40 437-449
[8] Khan S(2020)SACNN: Self-attention convolutional neural network for low-dose CT denoising with self-supervised perceptual loss network IEEE Trans. Med. Imaging 39 2289-2301
[9] Naseer M(1964)On estimating regression Theory Probab. Appl. 9 141-142
[10] Hayat M(2021)A review on the attention mechanism of deep learning Neurocomputing 452 48-62

← 1 2 3 4 5 6 →