共 55 条
[41]
Su Weijie, 2020, INT C LEARN REPR
[42]
VideoBERT: A Joint Model for Video and Language Representation Learning
[J].
2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2019),
2019,
:7463-7472
[43]
Tan H, 2019, 2019 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING AND THE 9TH INTERNATIONAL JOINT CONFERENCE ON NATURAL LANGUAGE PROCESSING (EMNLP-IJCNLP 2019), P5100
[44]
van den Oord Aaron<spacing, 2018, ARXIV
[45]
Vedantam R, 2015, PROC CVPR IEEE, P4566, DOI 10.1109/CVPR.2015.7299087
[46]
Unsupervised Feature Learning via Non-Parametric Instance Discrimination
[J].
2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR),
2018,
:3733-3742
[47]
Discriminatively Embedded K-Means for Multi-view Clustering
[J].
2016 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR),
2016,
:5356-5364
[48]
Describing Videos by Exploiting Temporal Structure
[J].
2015 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV),
2015,
:4507-4515
[49]
Yao T, 2021, AAAI CONF ARTIF INTE, V35, P10656
[50]
Hierarchy Parsing for Image Captioning
[J].
2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2019),
2019,
:2621-2629