Effect of fusing features from multiple DCNN architectures in image classification

被引:41
作者
Akilan, Thangarajah [1 ]
Wu, Qingming Jonathan [1 ]
Zhang, Hui [2 ]
机构
[1] Univ Windsor, Dept Elect & Comp Engn, 401 Sunset Ave, Windsor, ON, Canada
[2] Changsha Univ Sci & Technol, Coll Elect & Informat Engn, Changsha, Hunan, Peoples R China
关键词
image classification; feature extraction; image representation; neural nets; principal component analysis; image reconstruction; computer vision; DCNN architectures; automatic image classification; pre-trained deep convolutional neural networks; generalised feature space; principal component reconstruction; energy-level normalisation; fused feature vectors; image statistics representation; multiclass linear support vector machine; FFV;
D O I
10.1049/iet-ipr.2017.0232
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Automatic image classification has become a necessary task to handle the rapidly growing digital image usage. It has branched out many algorithms and adopted new techniques. Among them, feature fusion-based image classification methods rely on hand-crafted features traditionally. However, it has been proven that the bottleneck features extracted through pre-trained convolutional neural networks (CNNs) can improve the classification accuracy. Thence, this study analyses the effect of fusing such cues from multiple architectures without being tied to any hand-crafted features. First, the CNN features are extracted from three different pre-trained models, namely AlexNet, VGG-16, and Inception-V3. Then, a generalised feature space is formed by employing principal component reconstruction and energy-level normalisation, where the features from individual CNN are mapped into a common subspace and embedded using arithmetic rules to construct fused feature vectors (FFVs). This transformation play a vital role in creating a representation that is appearance invariant by capturing complementary information of different high-level features. Finally, a multi-class linear support vector machine is trained. The experimental results demonstrate that such multi-modal CNN feature fusion is well suited for image/object classification tasks, but surprisingly it has not been explored so far by the computer vision research community extensively.
引用
收藏
页码:1102 / 1110
页数:9
相关论文
共 51 条
  • [1] Akilan T, 2017, IEEE SYS MAN CYBERN, P566, DOI 10.1109/SMC.2017.8122666
  • [2] Fusion-based foreground enhancement for background subtraction using multivariate multi-model Gaussian distribution
    Akilan, Thangarajah
    Wu, Q. M. Jonathan
    Yang, Yimin
    [J]. INFORMATION SCIENCES, 2018, 430 : 414 - 431
  • [3] [Anonymous], INT C PATT REC ICPR
  • [4] [Anonymous], 2009, P IEEE C COMP VIS PA
  • [5] [Anonymous], P INT C PATT REC ICP
  • [6] [Anonymous], 2014, P 2 INT C LEARN REPR
  • [7] [Anonymous], 2016, P IEEE C COMP VIS PA
  • [8] [Anonymous], 2010, Multimedia (ISM), 2010 IEEE International Symposium on, DOI DOI 10.1109/ISM.2010.51
  • [9] [Anonymous], INT C PATT REC ICPR
  • [10] [Anonymous], P IEEE C COMP VIS PA