共 59 条
[11]
Multiscale Vision Transformers
[J].
2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2021),
2021,
:6804-6815
[12]
Temporally-Aware Feature Pooling for Action Spotting in Soccer Broadcasts
[J].
2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION WORKSHOPS, CVPRW 2021,
2021,
:4485-4494
[13]
ActionVLAD: Learning spatio-temporal aggregation for action classification
[J].
30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017),
2017,
:3165-3174
[14]
CMT: Convolutional Neural Networks Meet Vision Transformers
[J].
2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR),
2022,
:12165-12175
[15]
Patch-NetVLAD: Multi-Scale Fusion of Locally-Global Descriptors for Place Recognition
[J].
2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021,
2021,
:14136-14147
[16]
Deep Residual Learning for Image Recognition
[J].
2016 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR),
2016,
:770-778
[17]
Hochreiter S, 1997, NEURAL COMPUT, V9, P1735, DOI [10.1162/neco.1997.9.8.1735, 10.1007/978-3-642-24797-2, 10.1162/neco.1997.9.1.1]
[18]
Hu J., 2018, PROC IEEECVF C COMPU
[19]
LEARNING SPATIO-TEMPORAL REPRESENTATIONS WITH TEMPORAL SQUEEZE POOLING
[J].
2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING,
2020,
:2103-2107
[20]
Ioffe Sergey, 2015, Proceedings of Machine Learning Research, V37, P448