共 66 条
[1]
Abacha A.B., 2019, PROC C LABS EVAL FO, V2
[2]
VQA: Visual Question Answering
[J].
2015 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV),
2015,
:2425-2433
[3]
Arbelle A, 2021, Arxiv, DOI arXiv:2104.09829
[4]
Emerging Properties in Self-Supervised Vision Transformers
[J].
2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2021),
2021,
:9630-9640
[5]
Knowledge Aided Consistency for Weakly Supervised Phrase Grounding
[J].
2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR),
2018,
:4042-4050
[6]
CrDoCo: Pixel-level Domain Transfer with Cross-Domain Consistency
[J].
2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019),
2019,
:1791-1800
[7]
Chen ZF, 2019, Arxiv, DOI arXiv:1906.02549
[8]
Das A., 2016, C EMPIRICAL METHODS
[9]
Align2Ground: Weakly Supervised Phrase Grounding Guided by Image-Caption Alignment
[J].
2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2019),
2019,
:2601-2610
[10]
VirTex: Learning Visual Representations from Textual Annotations
[J].
2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021,
2021,
:11157-11168