共 55 条
[1]
Ballas Nicolas, 2015, Delving Deeper Into Convolution Networks for Learning Video Representation,
[2]
Attention Augmented Convolutional Networks
[J].
2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2019),
2019,
:3285-3294
[3]
Object Detection in Video with Spatiotemporal Sampling Networks
[J].
COMPUTER VISION - ECCV 2018, PT XII,
2018, 11216
:342-357
[4]
nuScenes: A multimodal dataset for autonomous driving
[J].
2020 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2020),
2020,
:11618-11628
[5]
End-to-End Object Detection with Transformers
[J].
COMPUTER VISION - ECCV 2020, PT I,
2020, 12346
:213-229
[7]
Chen YL, 2019, IEEE I CONF COMP VIS, P9774, DOI [10.1109/ICCV.2019.00987, 10.1109/iccv.2019.00987]
[8]
Chung J., 2014, arXiv
[9]
Dauphin YN, 2017, PR MACH LEARN RES, V70
[10]
Deng JJ, 2021, AAAI CONF ARTIF INTE, V35, P1201