Toward Attribute-Controlled Fashion Image Captioning

被引:0
作者
Cai, Chen [1 ]
Yap, Kim-Hui [1 ]
Wang, Suchen [1 ]
机构
[1] Nanyang Technol Univ, Sch Elect & Elect Engn, Singapore, Singapore
关键词
Fashion; image captioning; controllable; semantic understanding; dataset; ATTENTION;
D O I
10.1145/3671000
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Fashion image captioning is a critical task in the fashion industry that aims to automatically generate product descriptions for fashion items. However, existing fashion image captioning models predict a fixed caption for a particular fashion item once deployed, which does not cater to unique preferences. We explore a controllable way of fashion image captioning that allows the users to specify a few semantic attributes to guide the caption generation. Our approach utilizes semantic attributes as a control signal, giving users the ability to specify particular fashion attributes (e.g., stitch, knit, sleeve) and styles (e.g., cool, classic, fresh) that they want the model to incorporate when generating captions. By providing this level of customization, our approach creates more personalized and targeted captions that suit individual preferences. To evaluate the effectiveness of our proposed approach, we clean, filter, and assemble a new fashion image caption dataset called FACAD170K from the current FACAD dataset. This dataset facilitates learning and enables us to investigate the effectiveness of our approach. Our results demonstrate that our proposed approach outperforms existing fashion image captioning models as well as conventional captioning methods. Besides, we further validate the effectiveness of the proposed method on the MSCOCO and Flickr30K captioning datasets and achieve competitive performance.
引用
收藏
页数:18
相关论文
共 69 条
  • [1] Bottom-Up and Top-Down Attention for Image Captioning and Visual Question Answering
    Anderson, Peter
    He, Xiaodong
    Buehler, Chris
    Teney, Damien
    Johnson, Mark
    Gould, Stephen
    Zhang, Lei
    [J]. 2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2018, : 6077 - 6086
  • [2] SPICE: Semantic Propositional Image Caption Evaluation
    Anderson, Peter
    Fernando, Basura
    Johnson, Mark
    Gould, Stephen
    [J]. COMPUTER VISION - ECCV 2016, PT V, 2016, 9909 : 382 - 398
  • [3] Banerjee S., 2005, P ACL WORKSH INTR EX, V29, P65, DOI DOI 10.3115/1626355.1626389
  • [4] Top-down framework for weakly-supervised grounded image captioning
    Cai, Chen
    Wang, Suchen
    Yap, Kim-Hui
    Wang, Yi
    [J]. KNOWLEDGE-BASED SYSTEMS, 2024, 287
  • [5] ATTRIBUTE CONDITIONED FASHION IMAGE CAPTIONING
    Cai, Chen
    Yap, Kim-Hui
    Wang, Suchen
    [J]. 2022 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING, ICIP, 2022, : 1921 - 1925
  • [6] Human-like Controllable Image Captioning with Verb-specific Semantic Roles
    Chen, Long
    Jiang, Zhihong
    Xiao, Jun
    Liu, Wei
    [J]. 2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021, 2021, : 16841 - 16851
  • [7] SCA-CNN: Spatial and Channel-wise Attention in Convolutional Networks for Image Captioning
    Chen, Long
    Zhang, Hanwang
    Xiao, Jun
    Nie, Liqiang
    Shao, Jian
    Liu, Wei
    Chua, Tat-Seng
    [J]. 30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, : 6298 - 6306
  • [8] Chen SZ, 2020, PROC CVPR IEEE, P9959, DOI 10.1109/CVPR42600.2020.00998
  • [9] Cheng WH, 2021, ACM COMPUT SURV, V54, DOI [10.1145/3552468.3554360, 10.1145/3447239]
  • [10] Leveraging Weakly Annotated Data for Fashion Image Retrieval and Label Prediction
    Corbiere, Charles
    Ben-Younes, Hedi
    Rame, Alexandre
    Ollion, Charles
    [J]. 2017 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION WORKSHOPS (ICCVW 2017), 2017, : 2268 - 2274