共 94 条
[1]
Bottom-Up and Top-Down Attention for Image Captioning and Visual Question Answering
[J].
2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR),
2018,
:6077-6086
[2]
Ba JL, 2016, Layer normalization
[3]
G3RAPHGROUND: Graph-based Language Grounding
[J].
2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2019),
2019,
:4280-4289
[4]
Bolme DS, 2010, PROC CVPR IEEE, P2544, DOI 10.1109/CVPR.2010.5539960
[5]
Carion N, 2020, Img Proc Comp Vis Re, V12346, P213, DOI 10.1007/978-3-030-58452-8_13
[6]
Query-guided Regression Network with Context Policy for Phrase Grounding
[J].
2017 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV),
2017,
:824-832
[7]
Chen LC, 2016, Arxiv, DOI arXiv:1412.7062
[8]
Chen L, 2021, AAAI CONF ARTIF INTE, V35, P1036
[9]
Chen M, 2020, PR MACH LEARN RES, V119
[10]
Multi-Modal Dynamic Graph Transformer for Visual Grounding
[J].
2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2022),
2022,
:15513-15522