共 41 条
[2]
Bottom-Up and Top-Down Attention for Image Captioning and Visual Question Answering
[J].
2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR),
2018,
:6077-6086
[3]
Banerjee S., 2005, P ACL WORKSHOP INTRI, P65
[4]
Beltagy I, 2020, Arxiv, DOI [arXiv:2004.05150, 10.48550/arXiv.2004.05150]
[6]
Chen C.-F., 2021, P INT C LEARN REPR, P1
[7]
CrossViT: Cross-Attention Multi-Scale Vision Transformer for Image Classification
[J].
2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2021),
2021,
:347-356
[8]
Child R, 2019, Arxiv, DOI arXiv:1904.10509
[9]
CAPTIONING CHANGES IN BI-TEMPORAL REMOTE SENSING IMAGES
[J].
2021 IEEE INTERNATIONAL GEOSCIENCE AND REMOTE SENSING SYMPOSIUM IGARSS,
2021,
:2891-2894
[10]
Chu XX, 2021, ADV NEUR IN