Toward Attribute-Controlled Fashion Image Captioning

被引：0

作者：

Cai, Chen ^{[1
]}

Yap, Kim-Hui ^{[1
]}

Wang, Suchen ^{[1
]}

机构：

[1] Nanyang Technol Univ, Sch Elect & Elect Engn, Singapore, Singapore

来源：

ACM TRANSACTIONS ON MULTIMEDIA COMPUTING COMMUNICATIONS AND APPLICATIONS | 2024年 / 20卷 / 09期

关键词：

Fashion; image captioning; controllable; semantic understanding; dataset; ATTENTION;

D O I：

10.1145/3671000

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Fashion image captioning is a critical task in the fashion industry that aims to automatically generate product descriptions for fashion items. However, existing fashion image captioning models predict a fixed caption for a particular fashion item once deployed, which does not cater to unique preferences. We explore a controllable way of fashion image captioning that allows the users to specify a few semantic attributes to guide the caption generation. Our approach utilizes semantic attributes as a control signal, giving users the ability to specify particular fashion attributes (e.g., stitch, knit, sleeve) and styles (e.g., cool, classic, fresh) that they want the model to incorporate when generating captions. By providing this level of customization, our approach creates more personalized and targeted captions that suit individual preferences. To evaluate the effectiveness of our proposed approach, we clean, filter, and assemble a new fashion image caption dataset called FACAD170K from the current FACAD dataset. This dataset facilitates learning and enables us to investigate the effectiveness of our approach. Our results demonstrate that our proposed approach outperforms existing fashion image captioning models as well as conventional captioning methods. Besides, we further validate the effectiveness of the proposed method on the MSCOCO and Flickr30K captioning datasets and achieve competitive performance.

引用

页数：18

共 69 条

[1] Bottom-Up and Top-Down Attention for Image Captioning and Visual Question Answering
Anderson, Peter
He, Xiaodong
Buehler, Chris
Teney, Damien
Johnson, Mark
Gould, Stephen
Zhang, Lei
[J]. 2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2018, : 6077 - 6086
[2] SPICE: Semantic Propositional Image Caption Evaluation
Anderson, Peter
Fernando, Basura
Johnson, Mark
Gould, Stephen
[J]. COMPUTER VISION - ECCV 2016, PT V, 2016, 9909 : 382 - 398
[3] Banerjee S., 2005, P ACL WORKSH INTR EX, V29, P65, DOI DOI 10.3115/1626355.1626389
[4] Top-down framework for weakly-supervised grounded image captioning
Cai, Chen
Wang, Suchen
Yap, Kim-Hui
Wang, Yi
[J]. KNOWLEDGE-BASED SYSTEMS, 2024, 287
[5] ATTRIBUTE CONDITIONED FASHION IMAGE CAPTIONING
Cai, Chen
Yap, Kim-Hui
Wang, Suchen
[J]. 2022 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING, ICIP, 2022, : 1921 - 1925
[6] Human-like Controllable Image Captioning with Verb-specific Semantic Roles
Chen, Long
Jiang, Zhihong
Xiao, Jun
Liu, Wei
[J]. 2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021, 2021, : 16841 - 16851
[7] SCA-CNN: Spatial and Channel-wise Attention in Convolutional Networks for Image Captioning
Chen, Long
Zhang, Hanwang
Xiao, Jun
Nie, Liqiang
Shao, Jian
Liu, Wei
Chua, Tat-Seng
[J]. 30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, : 6298 - 6306
[8] Chen SZ, 2020, PROC CVPR IEEE, P9959, DOI 10.1109/CVPR42600.2020.00998
[9] Cheng WH, 2021, ACM COMPUT SURV, V54, DOI [10.1145/3552468.3554360, 10.1145/3447239]
[10] Leveraging Weakly Annotated Data for Fashion Image Retrieval and Label Prediction
Corbiere, Charles
Ben-Younes, Hedi
Rame, Alexandre
Ollion, Charles
[J]. 2017 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION WORKSHOPS (ICCVW 2017), 2017, : 2268 - 2274

← 1 2 3 4 5 6 7 →