共 56 条
[31]
Swin Transformer: Hierarchical Vision Transformer using Shifted Windows
[J].
2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2021),
2021,
:9992-10002
[32]
Long J, 2015, PROC CVPR IEEE, P3431, DOI 10.1109/CVPR.2015.7298965
[33]
Luo G., 2020, CVPR, P10034
[34]
Cascade Grouped Attention Network for Referring Expression Segmentation
[J].
MM '20: PROCEEDINGS OF THE 28TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA,
2020,
:1274-1282
[35]
Comprehension-guided referring expressions
[J].
30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017),
2017,
:3125-3134
[36]
Generation and Comprehension of Unambiguous Object Descriptions
[J].
2016 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR),
2016,
:11-20
[37]
Dynamic Multimodal Instance Segmentation Guided by Natural Language Queries
[J].
COMPUTER VISION - ECCV 2018, PT XI,
2018, 11215
:656-672
[38]
Pennington J., 2014, P 2014 C EMP METH NA, P1532, DOI DOI 10.3115/V1/D14-1162
[39]
Radford A, 2021, PR MACH LEARN RES, V139
[40]
You Only Look Once: Unified, Real-Time Object Detection
[J].
2016 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR),
2016,
:779-788