A Tree-Based Structure-Aware Transformer Decoder for Image-To-Markup Generation

被引:10
|
作者
Zhong, Shuhan [1 ]
Song, Sizhe [1 ]
Li, Guanyao [1 ,2 ]
Chan, S-H Gary [1 ]
机构
[1] Hong Kong Univ Sci & Technol, Hong Kong, Peoples R China
[2] Guangzhou Urban Planning & Design Survey Res Inst, Guangdong Enterprise Key Lab Urban Sensing Monito, Guangzhou, Peoples R China
来源
PROCEEDINGS OF THE 30TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2022 | 2022年
关键词
Image-To-Markup Generation; Tree-Structured Attention; Tree Decoder; Tree Generation; RECOGNITION; COMPETITION;
D O I
10.1145/3503161.3548424
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
Image-to-markup generation aims at translating an image into markup (structured language) that represents both the contents and the structural semantics corresponding to the image. Recent encoder-decoder based approaches typically employ string decoders to model the string representation of the target markup, which cannot effectively capture the rich embedded structural information. In this paper, we propose TSDNet, a novel Tree-based Structure-aware Transformer Decoder Network to directly generate the tree representation of the target markup in a structure-aware manner. Specifically, our model learns to sequentially predict the node attributes, edge attributes, and node connectivities by multi-task learning. Meanwhile, we introduce a novel tree-structured attention to our decoder such that it can directly operate on the partial tree generated in each step to fully exploit the structural information. TSDNet doesn't rely on any prior assumptions on the target tree structure, and can be jointly optimized with encoders in an end-to-end fashion. We evaluate the performance of our model on public image-to-markup generation datasets, and demonstrate its ability to learn the complicated correlation from the structural information in the target markup with significant improvement over state-of-the-art methods by up to 5.6% in mathematical expression recognition and up to 35.34% in chemical formula recognition.
引用
收藏
页码:5751 / 5760
页数:10
相关论文
共 50 条
  • [1] StructCoder: Structure-Aware Transformer for Code Generation
    Tipirneni, Sindhu
    Zhu, Ming
    Reddy, Chandan K.
    ACM TRANSACTIONS ON KNOWLEDGE DISCOVERY FROM DATA, 2024, 18 (03)
  • [2] Image-to-Markup Generation with Coarse-to-Fine Attention
    Deng, Yuntian
    Kanervisto, Anssi
    Ling, Jeffrey
    Rush, Alexander M.
    INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 70, 2017, 70
  • [3] Image-to-Markup Generation via Paired Adversarial Learning
    Wu, Jin-Wen
    Yin, Fei
    Zhang, Yan-Ming
    Zhang, Xu-Yao
    Liu, Cheng-Lin
    MACHINE LEARNING AND KNOWLEDGE DISCOVERY IN DATABASES, ECML PKDD 2018, PT I, 2019, 11051 : 18 - 34
  • [4] Structure-Aware Transformer for Shadow Detection
    Sun, Wanlu
    Xiang, Liyun
    Zhao, Wei
    IET IMAGE PROCESSING, 2025, 19 (01)
  • [5] GraphGST: Graph Generative Structure-Aware Transformer for Hyperspectral Image Classification
    Jiang, Mengying
    Su, Yuanchao
    Gao, Lianru
    Plaza, Antonio
    Zhao, Xi-Le
    Sun, Xu
    Liu, Guizhong
    IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2024, 62 : 1 - 16
  • [6] TreeGen: A Tree-Based Transformer Architecture for Code Generation
    Sun, Zeyu
    Zhu, Qihao
    Xiong, Yingfei
    Sun, Yican
    Mou, Lili
    Zhang, Lu
    THIRTY-FOURTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, THE THIRTY-SECOND INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE AND THE TENTH AAAI SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2020, 34 : 8984 - 8991
  • [7] Structure-aware image fusion
    Li, Wen
    Xie, Yuange
    Zhou, Haole
    Han, Ying
    Zhan, Kun
    OPTIK, 2018, 172 : 1 - 11
  • [8] Structure-Aware Procedural Text Generation From an Image Sequence
    Nishimura, Taichi
    Hashimoto, Atsushi
    Ushiku, Yoshitaka
    Kameko, Hirotaka
    Yamakata, Yoko
    Mori, Shinsuke
    IEEE ACCESS, 2021, 9 : 2125 - 2141
  • [9] Domain-based structure-aware image inpainting
    Wei, Yinwei
    Liu, Shiguang
    SIGNAL IMAGE AND VIDEO PROCESSING, 2016, 10 (05) : 911 - 919
  • [10] Domain-based structure-aware image inpainting
    Yinwei Wei
    Shiguang Liu
    Signal, Image and Video Processing, 2016, 10 : 911 - 919