共 59 条
- [1] Aloimono Y., 2011, P EMP METH NAT LANG
- [2] Bottom-Up and Top-Down Attention for Image Captioning and Visual Question Answering [J]. 2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2018, : 6077 - 6086
- [3] [Anonymous], 2017, PMLR
- [4] Banerjee S., 2005, P ACL WORKSH INTR EX, P65, DOI DOI 10.3115/1626355.1626389
- [5] Betti F., 2020, P 13 INT C NAT LANG, P29
- [6] Bianco S, 2023, Arxiv, DOI arXiv:2306.11593
- [7] Biswas K, 2022, Arxiv, DOI arXiv:2111.04682
- [8] Bochkovskiy A, 2020, Arxiv, DOI [arXiv:2004.10934, 10.48550/arXiv.2004.10934]
- [9] Bowman Samuel R., 2016, P 20 SIGNLL C COMPUT, P10, DOI [DOI 10.18653/V1/K16-1002, 10.18653/v1/K16-1002]
- [10] "Factual" or "Emotional": Stylized Image Captioning with Adaptive Learning and Attention [J]. COMPUTER VISION - ECCV 2018, PT X, 2018, 11214 : 527 - 543