共 62 条
[52]
Yang Z, 2023, AAAI CONF ARTIF INTE, P3222
[53]
LAVT: Language-Aware Vision Transformer for Referring Image Segmentation
[J].
2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2022),
2022,
:18134-18144
[54]
Cross-Modal Self-Attention Network for Referring Image Segmentation
[J].
2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019),
2019,
:10494-10503
[55]
MAttNet: Modular Attention Network for Referring Expression Comprehension
[J].
2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR),
2018,
:1307-1315
[57]
Zhang DW, 2024, Arxiv, DOI arXiv:2410.03987
[60]
VinVL: Revisiting Visual Representations in Vision-Language Models
[J].
2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021,
2021,
:5575-5584