共 52 条
[11]
Convolutional Two-Stream Network Fusion for Video Action Recognition
[J].
2016 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR),
2016,
:1933-1941
[12]
Feichtenhofer Christoph, 2018, arXiv
[13]
Ghosh P., 2018, ARXIV
[14]
Globerson Amir, 2018, ARXIV
[15]
YouTube2Text: Recognizing and Describing Arbitrary Activities Using Semantic Hierarchies and Zero-shot Recognition
[J].
2013 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV),
2013,
:2712-2719
[16]
Cross Modal Distillation for Supervision Transfer
[J].
2016 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR),
2016,
:2827-2836
[17]
He KM, 2020, IEEE T PATTERN ANAL, V42, P386, DOI [10.1109/TPAMI.2018.2844175, 10.1109/ICCV.2017.322]
[18]
Deep Residual Learning for Image Recognition
[J].
2016 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR),
2016,
:770-778
[19]
Hinton G., 2015, ARXIV150302531
[20]
Joint Syntax Representation Learning and Visual Cue Translation for Video Captioning
[J].
2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2019),
2019,
:8917-8926