共 98 条
- [81] Vinyals O, 2015, PROC CVPR IEEE, P3156, DOI 10.1109/CVPR.2015.7298935
- [82] Reconstruction Network for Video Captioning [J]. 2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2018, : 7622 - 7631
- [83] M3: Multimodal Memory Modelling for Video Captioning [J]. 2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2018, : 7512 - 7520
- [84] Wang L. M., 2016, P ECCV, V9912, P20
- [85] Learning Deep Structure-Preserving Image-Text Embeddings [J]. 2016 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2016, : 5005 - 5013
- [86] Interpretable Video Captioning via Trajectory Structured Localization [J]. 2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2018, : 6829 - 6837
- [87] Aggregated Residual Transformations for Deep Neural Networks [J]. 30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, : 5987 - 5995
- [88] Learning Multimodal Attention LSTM Networks for Video Captioning [J]. PROCEEDINGS OF THE 2017 ACM MULTIMEDIA CONFERENCE (MM'17), 2017, : 537 - 545
- [89] MSR-VTT: A Large Video Description Dataset for Bridging Video and Language [J]. 2016 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2016, : 5288 - 5296
- [90] Xu K, 2015, PR MACH LEARN RES, V37, P2048