A Tree-Based Structure-Aware Transformer Decoder for Image-To-Markup Generation

被引:10
|
作者
Zhong, Shuhan [1 ]
Song, Sizhe [1 ]
Li, Guanyao [1 ,2 ]
Chan, S-H Gary [1 ]
机构
[1] Hong Kong Univ Sci & Technol, Hong Kong, Peoples R China
[2] Guangzhou Urban Planning & Design Survey Res Inst, Guangdong Enterprise Key Lab Urban Sensing Monito, Guangzhou, Peoples R China
来源
PROCEEDINGS OF THE 30TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2022 | 2022年
关键词
Image-To-Markup Generation; Tree-Structured Attention; Tree Decoder; Tree Generation; RECOGNITION; COMPETITION;
D O I
10.1145/3503161.3548424
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
Image-to-markup generation aims at translating an image into markup (structured language) that represents both the contents and the structural semantics corresponding to the image. Recent encoder-decoder based approaches typically employ string decoders to model the string representation of the target markup, which cannot effectively capture the rich embedded structural information. In this paper, we propose TSDNet, a novel Tree-based Structure-aware Transformer Decoder Network to directly generate the tree representation of the target markup in a structure-aware manner. Specifically, our model learns to sequentially predict the node attributes, edge attributes, and node connectivities by multi-task learning. Meanwhile, we introduce a novel tree-structured attention to our decoder such that it can directly operate on the partial tree generated in each step to fully exploit the structural information. TSDNet doesn't rely on any prior assumptions on the target tree structure, and can be jointly optimized with encoders in an end-to-end fashion. We evaluate the performance of our model on public image-to-markup generation datasets, and demonstrate its ability to learn the complicated correlation from the structural information in the target markup with significant improvement over state-of-the-art methods by up to 5.6% in mathematical expression recognition and up to 35.34% in chemical formula recognition.
引用
收藏
页码:5751 / 5760
页数:10
相关论文
共 50 条
  • [41] Structure-Aware Nonlocal Optimization Framework for Image Colorization
    Zhao, Han-Li
    Nie, Gui-Zhi
    Li, Xu-Jie
    Jin, Xiao-Gang
    Pan, Zhi-Geng
    JOURNAL OF COMPUTER SCIENCE AND TECHNOLOGY, 2015, 30 (03) : 478 - 488
  • [42] Automatic Structure-Aware Inpainting for Complex Image Content
    Ndjiki-Nya, Patrick
    Koeppel, Martin
    Doshkov, Dimitar
    Wiegand, Thomas
    ADVANCES IN VISUAL COMPUTING, PT I, PROCEEDINGS, 2008, 5358 : 1144 - +
  • [43] Structure-Aware Flow Generation for Human Body Reshaping
    Ren, Jianqiang
    Yao, Yuan
    Lei, Biwen
    Cui, Miaomiao
    Xie, Xuansong
    2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2022, : 7744 - 7753
  • [44] UniLG: A Unified Structure-aware Framework for Lyrics Generation
    Qian, Tao
    Lou, Fan
    Shi, Jiatong
    Wu, Yuning
    Guo, Shuai
    Yin, Xiang
    Jin, Qin
    PROCEEDINGS OF THE 61ST ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, ACL 2023, VOL 1, 2023, : 983 - 1001
  • [45] SAViT: Structure-Aware Vision Transformer Pruning via Collaborative Optimization
    Zheng, Chuanyang
    Li, Zheyang
    Zhang, Kai
    Yang, Zhi
    Tan, Wenming
    Xiao, Jun
    Ren, Ye
    Pu, Shiliang
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 35 (NEURIPS 2022), 2022,
  • [46] Structure-Aware Transformer for hyper-relational knowledge graph completion
    Wang, Junjie
    Chen, Huajun
    Zhang, Wen
    EXPERT SYSTEMS WITH APPLICATIONS, 2025, 277
  • [47] Advancing rule learning in knowledge graphs with structure-aware graph transformer
    Xu, Kang
    Chen, Miqi
    Feng, Yifan
    Dong, Zhenjiang
    INFORMATION PROCESSING & MANAGEMENT, 2025, 62 (02)
  • [48] Heat Diffusion based Multi-scale and Geometric Structure-aware Transformer for Mesh Segmentation
    Wong, Chi-Chong
    2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR, 2023, : 4413 - 4422
  • [49] Tree-based indexes for image data
    Brown, L
    Gruenwald, L
    JOURNAL OF VISUAL COMMUNICATION AND IMAGE REPRESENTATION, 1998, 9 (04) : 300 - 313
  • [50] Tree-based tracking of temporal image
    Sakai, T
    Imiya, A
    Zen, H
    GRAPH-BASED REPRESENTATIONS IN PATTERN RECOGNITION, PROCEEDINGS, 2005, 3434 : 322 - 331