Smaller But Better: Unifying Layout Generation with Smaller Large Language Models

被引：0

作者：

Zhang, Peirong ^{[1
]}

Zhang, Jiaxin ^{[1
]}

Cao, Jiahuan ^{[1
]}

Li, Hongliang ^{[1
]}

Jin, Lianwen ^{[1
]}

机构：

[1] South China Univ Technol, Sch Elect & Informat Engn, Guangzhou, Peoples R China

来源：

INTERNATIONAL JOURNAL OF COMPUTER VISION | 2025年

基金：

中国国家自然科学基金;

关键词：

Large language model; Generative modeling; Unified layout generation;

D O I：

10.1007/s11263-025-02353-2

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

We propose LGGPT, an LLM-based model tailored for unified layout generation. First, we propose Arbitrary Layout Instruction (ALI) and Universal Layout Response (ULR) as the uniform I/O template. ALI accommodates arbitrary layout generation task inputs across multiple layout domains, enabling LGGPT to unify both task-generic and domain-generic layout generation hitherto unexplored. Collectively, ALI and ULR boast a succinct structure that forgoes superfluous tokens typically found in existing HTML-based formats, facilitating efficient instruction tuning and boosting unified generation performance. In addition, we propose an Interval Quantization Encoding (IQE) strategy that compresses ALI into a more condensed structure. IQE precisely preserves valid layout clues while eliminating the less informative placeholders, facilitating LGGPT to capture complex and variable layout generation conditions during the unified training process. Experimental results demonstrate that LGGPT achieves superior or on par performance compared to existing methods. Notably, LGGPT strikes a prominent balance between proficiency and efficiency with a compact 1.5B parameter LLM, which beats prior 7B or 175B models even in the most extensive and challenging unified scenario. Furthermore, we underscore the necessity of employing LLMs for unified layout generation and suggest that 1.5B could be an optimal parameter size by comparing LLMs of varying scales. Code is available at https://github.com/NiceRingNode/LGGPT.

引用

页码：3891 / 3917

页数：27

共 80 条

[1] Ainslie J., Lee-Thorp J., de Jong M., Zemlyanskiy Y., Lebron F., Sanghai S., GQA: Training Generalized Multi-Query Transformer Models from Multi-Head Checkpoints, EMNLP, pp. 4895-4901, (2023)
[2] Anil R., Dai A.M., Et al., (2023)
[3] Arroyo D.M., Postels J., Tombari F., Variational Transformer Networks for Layout Generation, CVPR, pp. 13642-13652, (2021)
[4] Blumenthal S., Multinomial sampling with partially categorized data, Journal of the American Statistical Association, 63, 322, pp. 542-551, (1968)
[5] Brown T., Mann B., Et al., Language Models are Few-Shot Learners, In: NeurIPS, 33, pp. 1877-1901, (2020)
[6] Chai S., Zhuang L., Yan F., LayoutDM: Transformer-Based Diffusion Model for Layout Generation. In: CVPR, pp. 18349-18358, (2023)
[7] Chowdhery A., Narang S., Et al., PaLM: Scaling Language Modeling with Pathways, Journal of Machine Learning Research, 24, 240, pp. 1-113, (2023)
[8] Chung H.W., Hou L., Et al., Scaling Instruction-Finetuned Language Models, (2022)
[9] Deka B., Huang Z., Franzen C., Hibschman J., Afergan D., Li Y., Nichols J., Kumar R., Rico: A Mobile App Dataset for Building Data-Driven Design Applications, UIST, pp. 845-854, (2017)
[10] Devlin J., Chang M.W., Lee K., Toutanova K., BERT: Pre-Training of Deep Bidirectional Transformers for Language Understanding. In: NAACL, pp. 4171-4186, (2019)

← 1 2 3 4 5 6 7 8 →