Learning Image Captioning as a Structured Transduction Task

被引:0
作者
Bacciu, Davide [1 ]
Serramazza, Davide [1 ]
机构
[1] Univ Pisa, Dipartimento Informat, L Go B Pontecorvo 3, Pisa, Italy
来源
ENGINEERING APPLICATIONS OF NEURAL NETWORKS, EAAAI/EANN 2022 | 2022年 / 1600卷
关键词
Structured transductions; Image captioning; Learning for structured data;
D O I
10.1007/978-3-031-08223-8_20
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Image captioning is a task typically approached by deep encoder-decoder architectures, where the encoder component works on a flat representation of the image while the decoder considers a sequential representation of natural language sentences. As such, these encoder-decoder architectures implement a simple and very specific form of structured transduction, that is a generalization of a predictive problem where the input data and output predictions might have substantially different structures and topologies. In this paper, we explore a generalization of such an approach by addressing the problem as a general structured transduction problem. In particular, we provide a framework that allows considering input and output information with a tree-structured representation. This allows taking into account the hierarchical nature underlying both images and sentences. To this end, we introduce an approach to generate tree-structured representations from images along with an autoencoder working with this kind of data. We empirically assess our approach on both synthetic and realistic tasks.
引用
收藏
页码:235 / 246
页数:12
相关论文
共 20 条
[1]  
Bacciu D., 2020, PINNS, V1, P236, DOI [10.1007/978-3-030-16841-425, DOI 10.1007/978-3-030-16841-425]
[2]   An input-output hidden Markov model for tree transductions [J].
Bacciu, Davide ;
Micheli, Alessio ;
Sperduti, Alessandro .
NEUROCOMPUTING, 2013, 112 :34-46
[3]   Compositional Generative Mapping for Tree-Structured Data-Part I: Bottom-Up Probabilistic Modeling of Trees [J].
Bacciu, Davide ;
Micheli, Alessio ;
Sperduti, Alessandro .
IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2012, 23 (12) :1987-2002
[4]  
Cho KYHY, 2014, Arxiv, DOI arXiv:1406.1078
[5]  
Dong L, 2016, Arxiv, DOI arXiv:1601.01280
[6]   A general framework for adaptive processing of data structures [J].
Frasconi, P ;
Gori, M ;
Sperduti, A .
IEEE TRANSACTIONS ON NEURAL NETWORKS, 1998, 9 (05) :768-786
[7]  
Graves A, 2012, STUD COMPUT INTELL, V385, P1, DOI [10.1007/978-3-642-24797-2, 10.1162/neco.1997.9.1.1]
[8]  
He KM, 2015, Arxiv, DOI arXiv:1512.03385
[9]  
Ioffe S., 2015, P 32 INT C MACHINE L, P448
[10]   Accurate unlexicalized parsing [J].
Klein, D ;
Manning, CD .
41ST ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, PROCEEDINGS OF THE CONFERENCE, 2003, :423-430