Assisting the Visually Impaired in Multi-object Scene Description Using OWA-Based Fusion of CNN Models

被引:7
作者
Alhichri, Haikel [1 ]
Bazi, Yakoub [1 ]
Alajlan, Naif [1 ]
机构
[1] King Saud Univ, Coll Comp & Informat Sci, Dept Comp Engn, Adv Lab Intelligent Syst Res ALISR, Riyadh 11543, Saudi Arabia
关键词
Multi-label image classification; Convolutional neural networks (CNN); VGG16; SqueezeNet; Ordered weighted averaging (OWA); Assistive technology for the visually impaired; BLIND PEOPLE; AGGREGATION; RECOGNITION; INFORMATION;
D O I
10.1007/s13369-020-04799-7
中图分类号
O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];
学科分类号
07 ; 0710 ; 09 ;
摘要
Advances in technology can provide a lot of support for visually impaired (VI) persons. In particular, computer vision and machine learning can provide solutions for object detection and recognition. In this work, we propose a multi-label image classification solution for assisting a VI person in recognizing the presence of multiple objects in a scene. The solution is based on the fusion of two deep CNN models using the induced ordered weighted averaging (OWA) approach. Namely, in this work, we fuse the outputs of two pre-trained CNN models, VGG16 and SqueezeNet. To use the induced OWA approach, we need to estimate a confidence measure in the outputs of the two CNN base models. To this end, we propose the residual error between the predicted output and the true output as a measure of confidence. We estimate this residual error using another dedicated CNN model that is trained on the residual errors computed from the main CNN models. Then, the OAW technique uses these estimated residual errors as confidence measures and fuses the decisions of the two main CNN models. When tested on four image datasets of indoor environments from two separate locations, the proposed novel method improves the detection accuracy compared to both base CNN models. The results are also significantly better than state-of-the-art methods reported in the literature.
引用
收藏
页码:10511 / 10527
页数:17
相关论文
共 64 条
  • [1] Effect of fusing features from multiple DCNN architectures in image classification
    Akilan, Thangarajah
    Wu, Qingming Jonathan
    Zhang, Hui
    [J]. IET IMAGE PROCESSING, 2018, 12 (07) : 1102 - 1110
  • [2] Alhichri H, 2018, INT GEOSCI REMOTE SE, P1195, DOI 10.1109/IGARSS.2018.8518874
  • [3] Helping the Visually Impaired See via Image Multi-labeling Based on SqueezeNet CNN
    Alhichri, Haikel
    Bazi, Yakoub
    Alajlan, Naif
    Jdira, Bilel Bin
    [J]. APPLIED SCIENCES-BASEL, 2019, 9 (21):
  • [4] Alsohybe N.T, 2017, MACHINE TRANSLATION
  • [5] [Anonymous], KOW LM4NCL 1 2 3 5 M
  • [6] [Anonymous], P 2004 IEEE COMP SOC
  • [7] [Anonymous], 2016, ARXIV160207360
  • [8] Atkinson R.D., 2019, Robotics and the Future of Production and Work
  • [9] Factors of Transferability for a Generic ConvNet Representation
    Azizpour, Hossein
    Razavian, Ali Sharif
    Sullivan, Josephine
    Maki, Atsuto
    Carlsson, Stefan
    [J]. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2016, 38 (09) : 1790 - 1802
  • [10] Badue C., 2019, Self-driving cars: A survey