共 53 条
[31]
Swin Transformer: Hierarchical Vision Transformer using Shifted Windows
[J].
2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2021),
2021,
:9992-10002
[32]
Multi-task Collaborative Network for Joint Referring Expression Comprehension and Segmentation
[J].
2020 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2020),
2020,
:10031-10040
[33]
The Stanford CoreNLP Natural Language Processing Toolkit
[J].
PROCEEDINGS OF 52ND ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: SYSTEM DEMONSTRATIONS,
2014,
:55-60
[34]
Generation and Comprehension of Unambiguous Object Descriptions
[J].
2016 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR),
2016,
:11-20
[35]
Margffoy-Tuay E, 2018, Arxiv, DOI arXiv:1807.02257
[36]
Mikolov T, 2013, North American chapter of the association for computational linguistics
[37]
Pennington J., 2014, P 2014 C EMPIRICAL M, P1532, DOI DOI 10.3115/V1/D14-1162
[39]
Radford A, 2021, PR MACH LEARN RES, V139