CTR-Driven Advertising Image Generation with Multimodal Large Language Models

被引:0
作者
Chen, Xingye [1 ]
Feng, Wei [2 ]
Du, Zhenbang [1 ]
Wang, Weizhen [2 ]
Chen, Yanyin [2 ]
Wang, Haohan [2 ]
Liu, Linkai [3 ]
Li, Yaoyu [2 ]
Zhao, Jinyuan [2 ]
Li, Yu [2 ]
Zhang, Zheng [2 ]
Lv, Jingjing [2 ]
Shen, Junjie [2 ]
Lin, Zhangang [2 ]
Shao, Jingping [2 ]
Shao, Yuanjie [1 ]
You, Xinge [1 ]
Gao, Changxin [1 ]
Sang, Nong [1 ]
机构
[1] Huazhong Univ Sci & Technol, Wuhan, Peoples R China
[2] JD COM, Beijing, Peoples R China
[3] Sun Yat Sen Univ, Shenzhen, Peoples R China
来源
PROCEEDINGS OF THE ACM WEB CONFERENCE 2025, WWW 2025 | 2025年
基金
国家重点研发计划;
关键词
CTR-Driven; Advertising Image Generation; Online Advertising; Multimodal Large Language Models;
D O I
10.1145/3696410.3714836
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
In web data, advertising images are crucial for capturing user attention and improving advertising effectiveness. Most existing methods generate background for products primarily focus on the aesthetic quality, which may fail to achieve satisfactory online performance. To address this limitation, we explore the use of Multimodal Large Language Models (MLLMs) for generating advertising images by optimizing for Click-Through Rate (CTR) as the primary objective. Firstly, we build targeted pre-training tasks, and leverage a large-scale e-commerce multimodal dataset to equip MLLMs with initial capabilities for advertising image generation tasks. To further improve the CTR of generated images, we propose a novel reward model to fine-tune pre-trained MLLMs through Reinforcement Learning (RL), which can jointly utilize multimodal features and accurately reflect user click preferences. Meanwhile, a product-centric preference optimization strategy is developed to ensure that the generated background content aligns with the product characteristics after fine-tuning, enhancing the overall relevance and effectiveness of the advertising images. Extensive experiments have demonstrated that our method achieves state-of-the-art performance in both online and offline metrics. Our code and pre-trained models are publicly available at: https://github.com/Chenguoz/CAIG.
引用
收藏
页码:2262 / 2275
页数:14
相关论文
共 56 条
[21]  
Li B, 2024, Arxiv, DOI arXiv:2408.03326
[22]   Relation-Aware Diffusion Model for Controllable Poster Layout Generation [J].
Li, Fengheng ;
Liu, An ;
Feng, Wei ;
Zhu, Honghe ;
Li, Yaoyu ;
Zhang, Zheng ;
Lv, Jingjing ;
Zhu, Xin ;
Shen, Junjie ;
Lin, Zhangang ;
Shao, Jingping .
PROCEEDINGS OF THE 32ND ACM INTERNATIONAL CONFERENCE ON INFORMATION AND KNOWLEDGE MANAGEMENT, CIKM 2023, 2023, :1249-1258
[23]  
Li ZC, 2024, Arxiv, DOI arXiv:2312.08822
[24]   Joint Optimization of Ad Ranking and Creative Selection [J].
Lin, Kaiyi ;
Zhang, Xiang ;
Li, Feng ;
Wang, Pengjie ;
Long, Qingqing ;
Deng, Hongbo ;
Xu, Jian ;
Zheng, Bo .
PROCEEDINGS OF THE 45TH INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL (SIGIR '22), 2022, :2341-2346
[25]   Microsoft COCO: Common Objects in Context [J].
Lin, Tsung-Yi ;
Maire, Michael ;
Belongie, Serge ;
Hays, James ;
Perona, Pietro ;
Ramanan, Deva ;
Dollar, Piotr ;
Zitnick, C. Lawrence .
COMPUTER VISION - ECCV 2014, PT V, 2014, 8693 :740-755
[26]  
Liu H., 2024, ADV NEURAL INFORM PR, V36
[27]  
Liu Haotian, 2024, LLAVA NEXT IMPROVED
[28]  
Liu YQ, 2024, Arxiv, DOI arXiv:2311.18765
[29]   RePaint: Inpainting using Denoising Diffusion Probabilistic Models [J].
Lugmayr, Andreas ;
Danelljan, Martin ;
Romero, Andres ;
Yu, Fisher ;
Timofte, Radu ;
Van Gool, Luc .
2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2022, :11451-11461
[30]  
Ziegler DM, 2020, Arxiv, DOI [arXiv:1909.08593, 10.48550/arXiv.1909.08593]