共 40 条
[21]
Li JY, 2021, ADV NEUR IN, V34
[22]
Visual Semantic Reasoning for Image-Text Matching
[J].
2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2019),
2019,
:4653-4661
[23]
Learning Dynamic Routing for Semantic Segmentation
[J].
2020 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2020),
2020,
:8550-8559
[24]
Microsoft COCO: Common Objects in Context
[J].
COMPUTER VISION - ECCV 2014, PT V,
2014, 8693
:740-755
[25]
Graph Structured Network for Image-Text Matching
[J].
2020 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2020),
2020,
:10918-10927
[26]
COTS: Collaborative Two-Stream Vision-Language Pre-Training Model for Cross-Modal Retrieval
[J].
2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2022),
2022,
:15671-15680
[27]
CoCo-BERT: Improving Video-Language Pre-training with Contrastive Cross-modal Matching and Denoising*
[J].
PROCEEDINGS OF THE 29TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2021,
2021,
:5600-5608
[29]
Song L., 2020, P ADV NEUR INF PROC, V33, P11131
[30]
Vaswani A, 2017, ADV NEUR IN, V30