共 17 条
[2]
Bottom-Up and Top-Down Attention for Image Captioning and Visual Question Answering
[J].
2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR),
2018,
:6077-6086
[3]
Bahdanau D, 2016, Arxiv, DOI [arXiv:1409.0473, 10.48550/arXiv.1409.0473, DOI 10.48550/ARXIV.1409.0473]
[4]
Bhatia Y., 2019, 2019 12 INT C CONT C, P1
[6]
Deng J, 2009, PROC CVPR IEEE, P248, DOI 10.1109/CVPRW.2009.5206848
[7]
Jyotsna A., 2023, Computer and Communication Engineering: Third International Conference, CCCE 2023, Revised Selected Papers. Communications in Computer and Information Science (1823), P95, DOI 10.1007/978-3-031-35299-7_8
[8]
Khan R, 2022, Arxiv, DOI arXiv:2203.01594
[9]
Oscar: Object-Semantics Aligned Pre-training for Vision-Language Tasks
[J].
COMPUTER VISION - ECCV 2020, PT XXX,
2020, 12375
:121-137
[10]
Neural Baby Talk
[J].
2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR),
2018,
:7219-7228