共 66 条
- [1] Spatio-Temporal Dynamics and Semantic Attribute Enriched Visual Encoding for Video Captioning [J]. 2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019), 2019, : 12479 - 12488
- [3] Alwassel H., 2020, P INT C NEUR INF PRO, P9758
- [4] Bottom-Up and Top-Down Attention for Image Captioning and Visual Question Answering [J]. 2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2018, : 6077 - 6086
- [5] ViViT: A Video Vision Transformer [J]. 2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2021), 2021, : 6816 - 6826
- [6] Frozen in Time: A Joint Video and Image Encoder for End-to-End Retrieval [J]. 2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2021), 2021, : 1708 - 1718
- [7] Hierarchical Boundary-Aware Neural Encoder for Video Captioning [J]. 30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, : 3185 - 3194
- [8] Quo Vadis, Action Recognition? A New Model and the Kinetics Dataset [J]. 30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, : 4724 - 4733
- [9] Chen David, 2011, P 49 ANN M ASS COMPU, P190
- [10] Chen JW, 2019, AAAI CONF ARTIF INTE, P8167