共 40 条
[1]
Bello I, Zoph B, Le Q, Vaswani A, Shlens J., Attention augmented convolutional networks, Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 3286-3295, (2019)
[2]
Chen CFR, Fan Q, Panda R., Crossvit: cross-attention multi-scale vision transformer for image classification, Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 357-366, (2021)
[3]
Dosovitskiy A, Beyer L, Kolesnikov A, Weissenborn D, Zhai X, Unterthiner T, Dehghani M, Minderer M, Heigold G, Gelly S, Uszkoreit J, Houlsby N., An image is worth 16x16 words: transformers for image recognition at scale, (2020)
[4]
d'Ascoli S, Touvron H, Leavitt ML, Morcos AS, Biroli G, Sagun L., Convit: improving vision transformers with soft convolutional inductive biases, International Conference on Machine Learning, (2021)
[5]
Fan H, Xiong B, Mangalam K, Li Y, Yan Z, Malik J, Feichtenhofer C., Multiscale vision transformers, Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 6824-6835, (2021)
[6]
Guo J, Han K, Wu H, Tang Y, Chen X, Wang Y, Xu C., CMT: convolutional neural networks meet vision transformers, Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 12165-12175, (2022)
[7]
Guo MH, Lu CZ, Liu ZN, Cheng MM, Hu SM., Visual attention network, Computational Visual Media, 9, 4, pp. 733-752, (2022)
[8]
Han K, Xiao A, Wu E, Guo J, Xu C, Wang Y., Transformer in transformer, Advances in Neural Information Processing Systems, 34, pp. 15908-15919, (2021)
[9]
He K, Zhang X, Ren S, Sun J., Deep residual learning for image recognition, Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770-778, (2016)
[10]
Howard J, Gugger S., Fastai: a layered API for deep learning, Information, 11, 2, (2020)