共 41 条
[1]
Multi-Level Multimodal Common Semantic Space for Image-Phrase Grounding
[J].
2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019),
2019,
:12468-12478
[2]
Bottom-Up and Top-Down Attention for Image Captioning and Visual Question Answering
[J].
2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR),
2018,
:6077-6086
[3]
VQA: Visual Question Answering
[J].
2015 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV),
2015,
:2425-2433
[4]
G3RAPHGROUND: Graph-based Language Grounding
[J].
2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2019),
2019,
:4280-4289
[5]
Soft-NMS - Improving Object Detection With One Line of Code
[J].
2017 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV),
2017,
:5562-5570
[6]
Knowledge Aided Consistency for Weakly Supervised Phrase Grounding
[J].
2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR),
2018,
:4042-4050
[7]
Chen L, 2021, AAAI CONF ARTIF INTE, V35, P1036
[8]
Chen T, 2020, PR MACH LEARN RES, V119
[9]
Chen XL, 2015, Arxiv, DOI arXiv:1504.00325
[10]
Dai B, 2017, Arxiv, DOI [arXiv:1710.02534, DOI 10.48550/ARXIV.1710.02534]