共 41 条
- [1] MUTAN: Multimodal Tucker Fusion for Visual Question Answering [J]. 2017 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2017, : 2631 - 2639
- [2] MUREL: Multimodal Relational Reasoning for Visual Question Answering [J]. 2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019), 2019, : 1989 - 1998
- [3] Chen JY, 2018, 2018 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP 2018), P162
- [4] Query-guided Regression Network with Context Policy for Phrase Grounding [J]. 2017 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2017, : 824 - 832
- [5] Chen SX, 2019, AAAI CONF ARTIF INTE, P8199
- [6] Cho K., 2014, P C EMP METH NAT LAN, P1724, DOI DOI 10.3115/V1/D14-1179
- [7] Devlin J, 2019, 2019 CONFERENCE OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: HUMAN LANGUAGE TECHNOLOGIES (NAACL HLT 2019), VOL. 1, P4171
- [8] Learning Spatiotemporal Features with 3D Convolutional Networks [J]. 2015 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2015, : 4489 - 4497
- [9] Faghri F, 2018, P BRIT MACH VIS C
- [10] TALL: Temporal Activity Localization via Language Query [J]. 2017 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2017, : 5277 - 5285