共 215 条
- [1] To React or not to React: End-to-End Visual Pose Forecasting for Personalized Avatar during Dyadic Conversations [J]. ICMI'19: PROCEEDINGS OF THE 2019 INTERNATIONAL CONFERENCE ON MULTIMODAL INTERACTION, 2019, : 74 - 84
- [2] Multi-Level Multimodal Common Semantic Space for Image-Phrase Grounding [J]. 2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019), 2019, : 12468 - 12478
- [3] Aktas B., 2018, Proceedings of the First Workshop on Computational Models of Reference, Anaphora and Coreference, P1
- [4] Alamri H, 2018, DSTC7 AAAI2019 WORKS, V2
- [5] Ammanabrolu P, 2020, INT C LEARNING REPRE
- [6] [Anonymous], 2022, TEXT TO TEXT TRANSFE
- [7] [Anonymous], 2022, COCO DATASET
- [8] [Anonymous], TEXT VIDEO EARLY ACC
- [9] [Anonymous], 2019, Advances in neural information processing systems
- [10] [Anonymous], 2022, MULTI30K