共 39 条
[31]
Rennie S J, Marcheret E, Mroueh Y, Ross J, Goel V., Self-critical sequence training for image captioning, Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 1179-1195, (2017)
[32]
Vedantam R, Zitnick C L, Parikh D., Cider: Consensus-based image description evaluation, Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 4566-4575, (2015)
[33]
Karpathy A, Li F F., Deep visual-semantic alignments for generating image descriptions, Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 3128-3137, (2015)
[34]
Papineni K, Roukos S, Ward T, Zhu W J., Bleu: A method for automatic evaluation of machine translation, Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics, pp. 311-318, (2002)
[35]
Denkowski M J, Lavie A., Meteor universal: Language specific translation evaluation for any target language, Proceedings of the 9th Workshop on Statistical Machine Translation, pp. 376-380, (2014)
[36]
Lin C Y., Rouge: A package for automatic evaluation of summaries, Proceedings of Workshop on Text Summarization Branches Out, Post-Conference Workshop of ACL 2004, pp. 74-81, (2004)
[37]
Anderson P, Fernando B, Johnson M, Gould S., Spice: Semantic propositional image caption evaluation, Proceedings of European Conference on Computer Vision, pp. 382-398, (2016)
[38]
Krishna R, Zhu Y, Groth O, Johnson J, Hata K, Kravitz J, Et al., Visual genome: Connecting language and vision using crowd-sourced dense image annotations, International Journal of Computer Vision, 123, 1, pp. 32-73, (2017)
[39]
Liu B, Wang D, Yang X, Zhou Y, Yao R, Shao Z, Et al., Show, deconfound and tell: Image captioning with causal inference, Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 18041-18050, (2022)