Prediction of visual saliency in video with deep CNNs

被引：1

作者：

Chaabouni, Souad ^{[1
,2
]}

Benois-Pineau, Jenny ^{[1
]}

Hadar, Ofer ^{[3
]}

机构：

[1] LaBRI UMR 5800, 351 Crs Liberat, F-33405 Talence, France

[2] Natl Engn Sch Sfax, REGIM Lab LR11ES48, BP1173, Sfax 3038, Tunisia

[3] Ben Gurion Univ Negev, Commun Syst Engn Dept, IL-84105 Beer Sheva, Israel

来源：

APPLICATIONS OF DIGITAL IMAGE PROCESSING XXXIX | 2016年 / 9971卷

关键词：

Deep Convolutional Neural Networks; residual motion; contrast; visual saliency;

D O I：

10.1117/12.2238956

中图分类号：

TM [电工技术]; TN [电子技术、通信技术];

学科分类号：

0808 ; 0809 ;

摘要：

Prediction of visual saliency in images and video is a highly researched topic. Target applications include Quality assessment of multimedia services in mobile context, video compression techniques, recognition of objects in video streams, etc. In the framework of mobile and egocentric perspectives, visual saliency models cannot be founded only on bottom-up features, as suggested by feature integration theory. The central bias hypothesis, is not respected neither. In this case, the top-down component of human visual attention becomes prevalent. Visual saliency can be predicted on the basis of seen data. Deep Convolutional Neural Networks (CNN) have proven to be a powerful tool for prediction of salient areas in stills. In our work we also focus on sensitivity of human visual system to residual motion in a video. A Deep CNN architecture is designed, where we incorporate input primary maps as color values of pixels and magnitude of local residual motion. Complementary contrast maps allow for a slight increase of accuracy compared to the use of color and residual motion only. The experiments show that the choice of the input features for the Deep CNN depends on visual task: for the intersts in dynamic content, the 4K model with residual motion is more efficient, and for object recognition in egocentric video the pure spatial input is more appropriate.

引用

页数：14

共 33 条

[1] Representation Learning: A Review and New Perspectives
Bengio, Yoshua
Courville, Aaron
Vincent, Pascal
[J]. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2013, 35 (08) : 1798 - 1828
[2] State-of-the-Art in Visual Attention Modeling
Borji, Ali
Itti, Laurent
[J]. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2013, 35 (01) : 185 - 207
[3] Boujut H, 2012, LECT NOTES COMPUT SC, V7585, P436, DOI 10.1007/978-3-642-33885-4_44
[4] REGION-OF-INTEREST INTRA PREDICTION FOR H.264/AVC ERROR RESILIENCE
Boulos, Fadi
Chen, Wei
Parrein, Benoit
Le Callet, Patrick
[J]. 2009 16TH IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING, VOLS 1-6, 2009, : 3109 - 3112
[5] Brouard O., 2009, COMPRESSION REPRESEN
[6] Chaabouni S., 2016, 2016 IEEE INT C IM P, V91
[7] Chaabouni S., 2016, ELECT IMAGING UNPUB
[8] Chaabouni S., 2016, 2016 14 INT WORKSH C, V91, P1
[9] Chaabouni S, 2016, CORR
[10] Engineering observations from spatiovelocity and spatiotemporal visual models
Daly, S
[J]. HUMAN VISION AND ELECTRONIC IMAGING III, 1998, 3299 : 180 - 191

← 1 2 3 4 →