共 39 条
[1]
Ji J, Luo Y, Sun X, Chen F, Luo G, Wu Y, Et al., Improving image captioning by leveraging intra- and inter-layer global representation in Transformer network, Proceedings of the AAAI Conference on Artificial Intelligence. Virtual Conference, pp. 1655-1663, (2021)
[2]
Fang Z, Wang J, Hu X, Liang L, Gan Z, Wang L, Et al., Injecting semantic concepts into end-to-end image captioning, Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 18009-18019, (2022)
[3]
Tan J H, Tan Y H, Chan C S, Chuah J H., Acort: A compact object relation transformer for parameter efficient image captioning, Neurocomputing, 482, pp. 60-72, (2022)
[4]
Fei Z., Attention-aligned Transformer for image captioning, Proceedings of the AAAI Conference on Artificial Intelligence, pp. 607-615, (2022)
[5]
Stefanini M, Cornia M, Baraldi L, Cascianelli S, Fiameni G, Cucchiara R., From show to tell: A survey on deep learning-based image captioning, IEEE Transactions on Pattern Analysis and Machine Intelligence, 45, 1, pp. 539-559, (2022)
[6]
Vinyals O, Toshev A, Bengio S, Erhan D., Show and tell: Lessons learned from the 2015 mscoco image captioning challenge, IEEE Transactions on Multimedia, 39, 4, pp. 652-663, (2016)
[7]
Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez A N, Et al., Attention is all you need, Proceedings of Advances in Neural Information Processing Systems, pp. 5998-6008, (2017)
[8]
Cover T M, Thomas J A., Elements of Information Theory, (2012)
[9]
Lin T Y, Maire M, Belongie S J, Hays J, Perona P, Ramanan D, Et al., Microsoft coco: Common objects in context, Proceedings of European Conference on Computer Vision, pp. 740-755, (2014)
[10]
Qin Y, Du J, Zhang Y, Lu H., Look back and predict forward in image captioning, Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 8367-8375, (2019)