A Tree-Based Structure-Aware Transformer Decoder for Image-To-Markup Generation

被引：10

作者：

Zhong, Shuhan ^{[1
]}

Song, Sizhe ^{[1
]}

Li, Guanyao ^{[1
,2
]}

Chan, S-H Gary ^{[1
]}

机构：

[1] Hong Kong Univ Sci & Technol, Hong Kong, Peoples R China

[2] Guangzhou Urban Planning & Design Survey Res Inst, Guangdong Enterprise Key Lab Urban Sensing Monito, Guangzhou, Peoples R China

来源：

PROCEEDINGS OF THE 30TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2022 | 2022年

关键词：

Image-To-Markup Generation; Tree-Structured Attention; Tree Decoder; Tree Generation; RECOGNITION; COMPETITION;

D O I：

10.1145/3503161.3548424

中图分类号：

TP39 [计算机的应用];

学科分类号：

081203 ; 0835 ;

摘要：

Image-to-markup generation aims at translating an image into markup (structured language) that represents both the contents and the structural semantics corresponding to the image. Recent encoder-decoder based approaches typically employ string decoders to model the string representation of the target markup, which cannot effectively capture the rich embedded structural information. In this paper, we propose TSDNet, a novel Tree-based Structure-aware Transformer Decoder Network to directly generate the tree representation of the target markup in a structure-aware manner. Specifically, our model learns to sequentially predict the node attributes, edge attributes, and node connectivities by multi-task learning. Meanwhile, we introduce a novel tree-structured attention to our decoder such that it can directly operate on the partial tree generated in each step to fully exploit the structural information. TSDNet doesn't rely on any prior assumptions on the target tree structure, and can be jointly optimized with encoders in an end-to-end fashion. We evaluate the performance of our model on public image-to-markup generation datasets, and demonstrate its ability to learn the complicated correlation from the structural information in the target markup with significant improvement over state-of-the-art methods by up to 5.6% in mathematical expression recognition and up to 35.34% in chemical formula recognition.

引用

页码：5751 / 5760

页数：10

共 50 条

[1] StructCoder: Structure-Aware Transformer for Code Generation
Tipirneni, Sindhu
Zhu, Ming
Reddy, Chandan K.
ACM TRANSACTIONS ON KNOWLEDGE DISCOVERY FROM DATA, 2024, 18 (03)
[2] Image-to-Markup Generation with Coarse-to-Fine Attention
Deng, Yuntian
Kanervisto, Anssi
Ling, Jeffrey
Rush, Alexander M.
INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 70, 2017, 70
[3] Image-to-Markup Generation via Paired Adversarial Learning
Wu, Jin-Wen
Yin, Fei
Zhang, Yan-Ming
Zhang, Xu-Yao
Liu, Cheng-Lin
MACHINE LEARNING AND KNOWLEDGE DISCOVERY IN DATABASES, ECML PKDD 2018, PT I, 2019, 11051 : 18 - 34
[4] Structure-Aware Transformer for Shadow Detection
Sun, Wanlu
Xiang, Liyun
Zhao, Wei
IET IMAGE PROCESSING, 2025, 19 (01)
[5] GraphGST: Graph Generative Structure-Aware Transformer for Hyperspectral Image Classification
Jiang, Mengying
Su, Yuanchao
Gao, Lianru
Plaza, Antonio
Zhao, Xi-Le
Sun, Xu
Liu, Guizhong
IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2024, 62 : 1 - 16
[6] TreeGen: A Tree-Based Transformer Architecture for Code Generation
Sun, Zeyu
Zhu, Qihao
Xiong, Yingfei
Sun, Yican
Mou, Lili
Zhang, Lu
THIRTY-FOURTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, THE THIRTY-SECOND INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE AND THE TENTH AAAI SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2020, 34 : 8984 - 8991
[7] Structure-aware image fusion
Li, Wen
Xie, Yuange
Zhou, Haole
Han, Ying
Zhan, Kun
OPTIK, 2018, 172 : 1 - 11
[8] Structure-Aware Procedural Text Generation From an Image Sequence
Nishimura, Taichi
Hashimoto, Atsushi
Ushiku, Yoshitaka
Kameko, Hirotaka
Yamakata, Yoko
Mori, Shinsuke
IEEE ACCESS, 2021, 9 : 2125 - 2141
[9] Domain-based structure-aware image inpainting
Wei, Yinwei
Liu, Shiguang
SIGNAL IMAGE AND VIDEO PROCESSING, 2016, 10 (05) : 911 - 919
[10] Domain-based structure-aware image inpainting
Yinwei Wei
Shiguang Liu
Signal, Image and Video Processing, 2016, 10 : 911 - 919

← 1 2 3 4 5 →