A Tree-Based Structure-Aware Transformer Decoder for Image-To-Markup Generation

被引:10
|
作者
Zhong, Shuhan [1 ]
Song, Sizhe [1 ]
Li, Guanyao [1 ,2 ]
Chan, S-H Gary [1 ]
机构
[1] Hong Kong Univ Sci & Technol, Hong Kong, Peoples R China
[2] Guangzhou Urban Planning & Design Survey Res Inst, Guangdong Enterprise Key Lab Urban Sensing Monito, Guangzhou, Peoples R China
来源
PROCEEDINGS OF THE 30TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2022 | 2022年
关键词
Image-To-Markup Generation; Tree-Structured Attention; Tree Decoder; Tree Generation; RECOGNITION; COMPETITION;
D O I
10.1145/3503161.3548424
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
Image-to-markup generation aims at translating an image into markup (structured language) that represents both the contents and the structural semantics corresponding to the image. Recent encoder-decoder based approaches typically employ string decoders to model the string representation of the target markup, which cannot effectively capture the rich embedded structural information. In this paper, we propose TSDNet, a novel Tree-based Structure-aware Transformer Decoder Network to directly generate the tree representation of the target markup in a structure-aware manner. Specifically, our model learns to sequentially predict the node attributes, edge attributes, and node connectivities by multi-task learning. Meanwhile, we introduce a novel tree-structured attention to our decoder such that it can directly operate on the partial tree generated in each step to fully exploit the structural information. TSDNet doesn't rely on any prior assumptions on the target tree structure, and can be jointly optimized with encoders in an end-to-end fashion. We evaluate the performance of our model on public image-to-markup generation datasets, and demonstrate its ability to learn the complicated correlation from the structural information in the target markup with significant improvement over state-of-the-art methods by up to 5.6% in mathematical expression recognition and up to 35.34% in chemical formula recognition.
引用
收藏
页码:5751 / 5760
页数:10
相关论文
共 50 条
  • [31] Retrofitting Structure-aware Transformer Language Model for End Tasks
    Fei, Hao
    Ren, Yafeng
    Ji, Donghong
    PROCEEDINGS OF THE 2020 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP), 2020, : 2151 - 2161
  • [32] Structure-Aware Cross-Modal Transformer for Depth Completion
    Zhao, Linqing
    Wei, Yi
    Li, Jiaxin
    Zhou, Jie
    Lu, Jiwen
    IEEE TRANSACTIONS ON IMAGE PROCESSING, 2024, 33 : 1016 - 1031
  • [33] Tree-based picture generation
    Drewes, F
    THEORETICAL COMPUTER SCIENCE, 2000, 246 (1-2) : 1 - 51
  • [34] Structure-aware Loss Function for Ultrasound Image Segmentation
    Fu, Yixuan
    Chen, Junying
    Li, Kai
    INTERNATIONAL ULTRASONICS SYMPOSIUM (IEEE IUS 2021), 2021,
  • [35] SAC-GAN: Structure-Aware Image Composition
    Zhou, Hang
    Ma, Rui
    Zhang, Ling-Xiao
    Gao, Lin
    Mahdavi-Amiri, Ali
    Zhang, Hao
    IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS, 2024, 30 (07) : 3151 - 3165
  • [36] Structure-Aware Multikernel Learning for Hyperspectral Image Classification
    Zhou, Chengle
    Tu, Bing
    Li, Nanying
    He, Wei
    Plaza, Antonio
    IEEE JOURNAL OF SELECTED TOPICS IN APPLIED EARTH OBSERVATIONS AND REMOTE SENSING, 2021, 14 : 9837 - 9854
  • [37] Structure-Aware Collaborative Representation for Hyperspectral Image Classification
    Li, Wei
    Zhang, Yuxiang
    Liu, Na
    Du, Qian
    Tao, Ran
    IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2019, 57 (09): : 7246 - 7261
  • [38] Structure-Aware Nonlocal Optimization Framework for Image Colorization
    Han-Li Zhao
    Gui-Zhi Nie
    Xu-Jie Li
    Xiao-Gang Jin
    Zhi-Geng Pan
    Journal of Computer Science and Technology, 2015, 30 : 478 - 488
  • [39] Chemical structure-aware molecular image representation learning
    Xiang, Hongxin
    Jin, Shuting
    Liu, Xiangrong
    Zeng, Xiangxiang
    Zeng, Li
    BRIEFINGS IN BIOINFORMATICS, 2023, 24 (06)
  • [40] Structure-Aware Deep Learning for Product Image Classification
    Chen, Zhineng
    Al, Shanshan
    Jia, Caiyan
    ACM TRANSACTIONS ON MULTIMEDIA COMPUTING COMMUNICATIONS AND APPLICATIONS, 2019, 15 (01)