共 51 条
[11]
Every Picture Tells a Story: Generating Sentences from Images
[J].
COMPUTER VISION-ECCV 2010, PT IV,
2010, 6314
:15-+
[13]
Deep Compositional Captioning: Describing Novel Object Categories without Paired Training Data
[J].
2016 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR),
2016,
:1-10
[14]
Hochreiter S, 1997, NEURAL COMPUT, V9, P1735, DOI [10.1162/neco.1997.9.1.1, 10.1007/978-3-642-24797-2]
[15]
Speed/accuracy trade-offs for modern convolutional object detectors
[J].
30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017),
2017,
:3296-+
[16]
Fast and Accurate Content-based Semantic Search in 100M Internet Videos
[J].
MM'15: PROCEEDINGS OF THE 2015 ACM MULTIMEDIA CONFERENCE,
2015,
:49-58
[17]
Bridging the Ultimate Semantic Gap: A Semantic Search Engine for Internet Videos
[J].
ICMR'15: PROCEEDINGS OF THE 2015 ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA RETRIEVAL,
2015,
:27-34
[18]
Karpathy A, 2015, PROC CVPR IEEE, P3128, DOI 10.1109/CVPR.2015.7298932
[19]
Kingma D. P., 2015, P 3 INT C LEARN REPR
[20]
Kiros R, 2014, PR MACH LEARN RES, V32, P595