Image and Video Captioning for Apparels Using Deep Learning

被引：2

作者：

Agarwal, Govind ^{[1
]}

Jindal, Kritika ^{[1
]}

Chowdhury, Abishi ^{[1
]}

Singh, Vishal K. ^{[2
]}

Pal, Amrit ^{[1
]}

机构：

[1] Vellore Inst Technol, Sch Comp Sci & Engn, Chennai 600127, India

[2] Univ Essex, Sch Comp Sci & Elect Engn, Colchester Campus, Colchester CO4 3SQ, England

来源：

IEEE ACCESS | 2024年 / 12卷

关键词：

Feature extraction; Convolutional neural networks; Long short term memory; Clothing; Visualization; Deep learning; Computational modeling; YOLO; Apparel captioning; BLEU score; CNN; ConvNeXtLarge; deep learning; LSTM;

D O I：

10.1109/ACCESS.2024.3443422

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

In the rapidly evolving world of apparel, writing clear and interesting product descriptions is crucial to attract customers. In light of the importance of automated descriptions for apparel, this work explores the field of image captioning for apparel photos and expands its use to include captioning videos to enable visually impaired people to access and understand dynamic apparel content. To address the issue of diversity in datasets, we curated a collection of images that were divided into 26 classifications. With the use of Convolutional Neural Network (CNN) architectures like ConvNeXtLarge and Long Short-Term Memory (LSTM) architectures, our suggested system can automatically provide accurate and captivating captions for both still photos and moving videos that feature clothing. The LSTM network smoothly blends the visual data extracted by the CNN component from clothing photos and videos to produce captions that are both semantically and linguistically accurate. In addition, a YOLO model is included for real-time object detection, which makes it possible for the model to precisely identify and track several articles of clothing at once. The suggested architecture is evaluated using the BLEU score performance metric; research on the selected dataset yielded a BLEU-1 score of 0.983 for the ConvNeXtLarge-based model.

引用

页码：113138 / 113150

页数：13

共 23 条

[1] Arabic Captioning for Images of Clothing Using Deep Learning [J].

Al-Malki, Rasha Saleh ;

Al-Aama, Arwa Yousuf .

SENSORS, 2023, 23 (08)

[2] Deep image captioning using an ensemble of CNN and LSTM based deep neural networks [J].

Alzubi, Jafar A. ;

Jain, Rachna ;

Nagrath, Preeti ;

Satapathy, Suresh ;

Taneja, Soham ;

Gupta, Paras .

JOURNAL OF INTELLIGENT & FUZZY SYSTEMS, 2021, 40 (04) :5761-5769

[3] Image Captioning for the Visually Impaired and Blind: A Recipe for Low-Resource Languages [J].

Arystanbekov, Batyr ;

Kuzdeuov, Askat ;

Nurgaliyev, Shakhizat ;

Varol, Huseyin Atakan .

2023 45TH ANNUAL INTERNATIONAL CONFERENCE OF THE IEEE ENGINEERING IN MEDICINE & BIOLOGY SOCIETY, EMBC, 2023,

[4]

bcg, 2017, E-Commerce Poised to Capture 41% of Global Retail SalesBy 2027Up From Just 18% in 2017

[5] ATTRIBUTE CONDITIONED FASHION IMAGE CAPTIONING [J].

Cai, Chen ;

Yap, Kim-Hui ;

Wang, Suchen .

2022 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING, ICIP, 2022, :1921-1925

[6]

Cho S., 2023, Appl. Sci, V13, P19

[7]

CVAT, about us

[8]

Dwivedi Pulkit, 2022, 2022 12th International Conference on Cloud Computing, Data Science & Engineering (Confluence), P207, DOI 10.1109/Confluence52989.2022.9734171

[9]

Feng Zuwei, 2023, 2023 IEEE 14th International Conference on Software Engineering and Service Science (ICSESS), P212, DOI 10.1109/ICSESS58500.2023.10293038

[10]

Ghosh Ayan, 2020, Advances in Intelligent Systems and Computing, V937, P171, DOI [10.1007/978-981-13-7403-6_17, DOI 10.1007/978-981-13-7403-6_17]

← 1 2 3 →