Objects and scenes classification with selective use of central and peripheral image content

被引：8

作者：

Alameer, Ali ^{[1
]}

Degenaar, Patrick ^{[1
,2
]}

Nazarpour, Kianoush ^{[1
,2
]}

机构：

[1] Newcastle Univ, Sch Engn, Newcastle, NSW NE1 7RU, Australia

[2] Newcastle Univ, Inst Neurosci, Newcastle, NSW NE2 4HH, Australia

来源：

JOURNAL OF VISUAL COMMUNICATION AND IMAGE REPRESENTATION | 2020年 / 66卷

基金：

英国工程与自然科学研究理事会;

关键词：

Visual recognition; Image understanding; Visual-data reduction; Biological visual-systems; Visual perception; Scene analysis; CONVOLUTIONAL NEURAL-NETWORKS; RECOGNITION; MODEL; REPRESENTATION; AREA; FIXATION; VISION; GIST;

D O I：

10.1016/j.jvcir.2019.102698

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

The human visual recognition system is more efficient than any current robotic vision setting. One reason for this superiority is that humans utilize different fields of vision, depending on the recognition task. For instance, experiments on human subjects show that the peripheral vision is more useful than the central vision in recognizing scenes. We tested our recently-developed model, that is, the elastic net-regularized hierarchical MAX (En-HMAX), in recognizing objects and scenes. In various experimental conditions, images were occluded with windows and scotomas of varying sizes. With this model, classification accuracies of up to 90% for objects and scenes were possible. Modelling human experiments, window and scotoma analysis with the En-HMAX model revealed that object and scene recognition are sensitive to the availability of data in the centre and the periphery of the images, respectively. Similarly, results of deep learning models have shown that the classification accuracy diminishes dramatically in the absence of the peripheral vision. These differences led us to further analyse the performance of the En-HMAX model with the parafoveal versus peripheral areas of vision, in a second study. Results of the second study show that approximately 50% of the visual field would be sufficient to achieve 96% accuracy in the classification of unseen images. The En-HMAX model adopts a relative order of importance, similar to the human visual system, depending on the image category. We showed that utilizing the relevant regions of vision can significantly reduce the image processing time and size. (C) 2019 Elsevier Inc. All rights reserved.

引用

页数：11

共 85 条

[1] Incoherent Dictionary Pair Learning: Application to a Novel Open-Source Database of Chinese Numbers [J].

Abolghasemi, Vahid ;

Chen, Mingyang ;

Alameer, Ali ;

Ferdowsi, Saideh ;

Chambers, Jonathon ;

Nazarpour, Kianoush .

IEEE SIGNAL PROCESSING LETTERS, 2018, 25 (04) :472-476

[2]

Aditya S., 2017, Computer Vision and Image Understanding

[3]

Alamdari A., 2015, 2015 10 INT C INT SY, P1

[4]

Alameer A, 2017, 2017 IEEE INTERNATIONAL CONFERENCE ON INNOVATIONS IN INTELLIGENT SYSTEMS AND APPLICATIONS (INISTA), P163, DOI 10.1109/INISTA.2017.8001150

[5]

Alameer A, 2016, 2016 INTERNATIONAL CONFERENCE FOR STUDENTS ON APPLIED ENGINEERING (ICSAE), P129, DOI 10.1109/ICSAE.2016.7810174

[6] Object Recognition With an Elastic Net-Regularized Hierarchical MAX Model of the Visual Cortex [J].

Alameer, Ali ;

Ghazaei, Ghazal ;

Degenaar, Patrick ;

Chambers, Jonathon A. ;

Nazarpour, Kianoush .

IEEE SIGNAL PROCESSING LETTERS, 2016, 23 (08) :1062-1066

[7]

[Anonymous], P 2 IET INT C INT SI

[8]

[Anonymous], P NAT ACAD SCI

[9]

[Anonymous], 2015, ARXIV151106292, DOI 10.48550/arXiv.1511.06292

[10]

[Anonymous], 2019, SCI INF C

← 1 2 3 4 5 6 7 8 9 →