A Tree-Based Structure-Aware Transformer Decoder for Image-To-Markup Generation

被引:10
|
作者
Zhong, Shuhan [1 ]
Song, Sizhe [1 ]
Li, Guanyao [1 ,2 ]
Chan, S-H Gary [1 ]
机构
[1] Hong Kong Univ Sci & Technol, Hong Kong, Peoples R China
[2] Guangzhou Urban Planning & Design Survey Res Inst, Guangdong Enterprise Key Lab Urban Sensing Monito, Guangzhou, Peoples R China
来源
PROCEEDINGS OF THE 30TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2022 | 2022年
关键词
Image-To-Markup Generation; Tree-Structured Attention; Tree Decoder; Tree Generation; RECOGNITION; COMPETITION;
D O I
10.1145/3503161.3548424
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
Image-to-markup generation aims at translating an image into markup (structured language) that represents both the contents and the structural semantics corresponding to the image. Recent encoder-decoder based approaches typically employ string decoders to model the string representation of the target markup, which cannot effectively capture the rich embedded structural information. In this paper, we propose TSDNet, a novel Tree-based Structure-aware Transformer Decoder Network to directly generate the tree representation of the target markup in a structure-aware manner. Specifically, our model learns to sequentially predict the node attributes, edge attributes, and node connectivities by multi-task learning. Meanwhile, we introduce a novel tree-structured attention to our decoder such that it can directly operate on the partial tree generated in each step to fully exploit the structural information. TSDNet doesn't rely on any prior assumptions on the target tree structure, and can be jointly optimized with encoders in an end-to-end fashion. We evaluate the performance of our model on public image-to-markup generation datasets, and demonstrate its ability to learn the complicated correlation from the structural information in the target markup with significant improvement over state-of-the-art methods by up to 5.6% in mathematical expression recognition and up to 35.34% in chemical formula recognition.
引用
收藏
页码:5751 / 5760
页数:10
相关论文
共 50 条
  • [21] A Tree-based Decoder for Neural Machine Translation
    Wang, Xinyi
    Pham, Hieu
    Yin, Pengcheng
    Neubig, Graham
    2018 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP 2018), 2018, : 4772 - 4777
  • [22] Tree Structure-Aware Few-Shot Image Classification via Hierarchical Aggregation
    Zhang, Min
    Huang, Siteng
    Li, Wenbin
    Wang, Donglin
    COMPUTER VISION, ECCV 2022, PT XX, 2022, 13680 : 453 - 470
  • [23] A transformer-based structure-aware model for tackling the traveling salesman problem
    Zhao, Chun-Sheng
    Wong, Li-Pei
    PLOS ONE, 2025, 20 (04):
  • [24] Structure-Aware Image Expansion with Global Attention
    Guo, Dewen
    Feng, Jie
    Zhou, Bingfeng
    SA'19: SIGGRAPH ASIA 2019 TECHNICAL BRIEFS, 2019, : 13 - 16
  • [25] Structure-Aware Image Segmentation with Homotopy Warping
    Hu, Xiaoling
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 35 (NEURIPS 2022), 2022,
  • [26] SEAG: Structure-Aware Event Causality Generation
    Tao, Zhengwei
    Jin, Zhi
    Bai, Xiaoying
    Zhao, Haiyan
    Dou, Chengfeng
    Zhao, Yongqiang
    Wang, Fang
    Tao, Chongyang
    FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, ACL 2023, 2023, : 4631 - 4644
  • [27] A Structure-aware Despeckling Method of SAR Image
    Jin, Xin
    Wang, Xiaotong
    Xu, Xiaogang
    Yi, Chengtao
    FOURTH INTERNATIONAL CONFERENCE ON CONTROL, AUTOMATION AND INFORMATION SCIENCES (CCAIS 2015), 2015, : 257 - 261
  • [28] Structure-Aware Image Resizing for Chinese Characters
    Liu, Chengdong
    Lian, Zhouhui
    Tang, Yingmin
    Xiao, Jianguo
    MULTIMEDIA MODELING (MMM 2017), PT I, 2017, 10132 : 379 - 390
  • [29] Response Generation via Structure-Aware Constraints
    Guan, Mengyu
    Wang, Zhongqing
    Zhou, Guodong
    ACM TRANSACTIONS ON ASIAN AND LOW-RESOURCE LANGUAGE INFORMATION PROCESSING, 2022, 21 (06)
  • [30] MASTIFF: Structure-Aware Minimum Spanning Tree/Forest
    Esfahani, Mohsen Koohi
    Kilpatrick, Peter
    Vandierendonck, Hans
    PROCEEDINGS OF THE 36TH ACM INTERNATIONAL CONFERENCE ON SUPERCOMPUTING, ICS 2022, 2022,