Condition-CNN: A hierarchical multi-label fashion image classification model

被引:40
作者
Kolisnik, Brendan [1 ]
Hogan, Isaac [1 ]
Zulkernine, Farhana [1 ]
机构
[1] Queens Univ, Sch Comp, Kingston, ON K7L 2N8, Canada
基金
加拿大自然科学与工程研究理事会;
关键词
Condition-CNN; Branching convolutional neural networks; Image classification; Convolutional neural networks; Hierarchical image classification; Teacher Forcing;
D O I
10.1016/j.eswa.2021.115195
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Current state of the art image classifiers predict a single class label of an image. However, in many industry settings such as online shopping, images belong to a class hierarchy where the first level represents the coarse grained or the most abstract class with subsequent levels representing the more specific classes. We propose a novel hierarchical image classification model, Condition-CNN, which addresses some of the shortcomings of the branching convolutional neural network in terms of training time and fine-grained accuracy. It applies the Teacher Forcing training algorithm, where the actual class labels of the higher level classes rather than the predicted labels are used to train the lower level branches. The technique also prevents error propagation, and thereby, reduces the training time. Besides learning the image features for each level of classes, Condition-CNN also learns the relationship between different levels of classes as conditional probabilities, which is used to estimate class predictions during scoring. By feeding the estimated higher-level class predictions as priors to the lower-level class prediction, Condition-CNN achieves a superior prediction accuracy while requiring fewer trainable parameters compared to the baseline CNN models. The validation results of Condition-CNN using the Kaggle Fashion Product Images data set demonstrate a prediction accuracy of 99.8%, 98.1%, and 91.0% for Level 1, 2 and 3 classes respectively, which are greater than that of B-CNN and other baseline CNN models. Moreover, Condition-CNN used only 77.58% of the total number of trainable parameters as that of B-CNN.
引用
收藏
页数:14
相关论文
共 24 条
  • [1] Aggarwal P., 2019, FASHION PRODUCT IMAG
  • [2] [Anonymous], 2016, P IEEE WINT C APPL C
  • [3] SURF: Speeded up robust features
    Bay, Herbert
    Tuytelaars, Tinne
    Van Gool, Luc
    [J]. COMPUTER VISION - ECCV 2006 , PT 1, PROCEEDINGS, 2006, 3951 : 404 - 417
  • [4] Chen Q, 2015, PROC CVPR IEEE, P5315, DOI 10.1109/CVPR.2015.7299169
  • [5] Leveraging Class Hierarchy in Fashion Classification
    Cho, Hyunsoo
    Ahn, Chaemin
    Yoo, Kang Min
    Seol, Jinseok
    Lee, Sang-goo
    [J]. 2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION WORKSHOPS (ICCVW), 2019, : 3197 - 3200
  • [6] Leveraging Weakly Annotated Data for Fashion Image Retrieval and Label Prediction
    Corbiere, Charles
    Ben-Younes, Hedi
    Rame, Alexandre
    Ollion, Charles
    [J]. 2017 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION WORKSHOPS (ICCVW 2017), 2017, : 2268 - 2274
  • [7] Gasmallah MH, 2018, 2018 IEEE 9TH ANNUAL INFORMATION TECHNOLOGY, ELECTRONICS AND MOBILE COMMUNICATION CONFERENCE (IEMCON), P365, DOI 10.1109/IEMCON.2018.8615054
  • [8] Grand View Research, 2020, COMM SEAW MARK AN PR
  • [9] ImageNet Classification with Deep Convolutional Neural Networks
    Krizhevsky, Alex
    Sutskever, Ilya
    Hinton, Geoffrey E.
    [J]. COMMUNICATIONS OF THE ACM, 2017, 60 (06) : 84 - 90
  • [10] Li PZ, 2019, IEEE IMAGE PROC, P3038, DOI [10.1109/ICIP.2019.8803394, 10.1109/icip.2019.8803394]