Salient region detection and segmentation for general object recognition and image understanding

被引：19

作者：

Huang TieJun ^{[1
]}

Tian YongHong ^{[1
]}

Li Jia ^{[2
]}

Yu HaoNan ^{[1
]}

机构：

[1] Peking Univ, Sch Elect Engn & Comp Sci, Natl Engn Lab Video Technol, Beijing 100871, Peoples R China

[2] Chinese Acad Sci, Inst Comp Technol, Key Lab Intelligent Informat Proc, Beijing 100190, Peoples R China

来源：

SCIENCE CHINA-INFORMATION SCIENCES | 2011年 / 54卷 / 12期

基金：

中国国家自然科学基金;

关键词：

object recognition; image understanding; visual saliency; salient object segmentation; visual dictionary;

D O I：

10.1007/s11432-011-4487-1

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

General object recognition and image understanding is recognized as a dramatic goal for computer vision and multimedia retrieval. In spite of the great efforts devoted in the last two decades, it still remains an open problem. In this paper, we propose a selective attention-driven model for general image understanding, named GORIUM (general object recognition and image understanding model). The key idea of our model is to discover recurring visual objects by selective attention modeling and pairwise local invariant features matching on a large image set in an unsupervised manner. Towards this end, it can be formulated as a four-layer bottomup model, i.e., salient region detection, object segmentation, automatic object discovering and visual dictionary construction. By exploiting multi-task learning methods to model visual saliency simultaneously with the bottom-up and top-down factors, the lowest layer can effectively detect salient objects in an image. The second layer exploits a simple yet effective learning approach to generate two complementary maps from several raw saliency maps, which then can be utilized to segment the salient objects precisely from a complex scene. For the third layer, we have also implemented an unsupervised approach to automatically discover general objects from large image set by pairwise matching with local invariant features. Afterwards, visual dictionary construction can be implemented by using many state-of-the-art algorithms and tools available nowadays.

引用

页码：2461 / 2470

页数：10

共 22 条

[1]

Achanta R, 2009, PROC CVPR IEEE, P1597, DOI 10.1109/CVPRW.2009.5206596

[2]

[Anonymous], 2003, P 11 ACM INT C MULTI, DOI DOI 10.1145/957013.957094

[3] SURF: Speeded up robust features [J].

Bay, Herbert ;

Tuytelaars, Tinne ;

Van Gool, Luc .

COMPUTER VISION - ECCV 2006 , PT 1, PROCEEDINGS, 2006, 3951 :404-417

[4] RECOGNITION-BY-COMPONENTS - A THEORY OF HUMAN IMAGE UNDERSTANDING [J].

BIEDERMAN, I .

PSYCHOLOGICAL REVIEW, 1987, 94 (02) :115-147

[5] An experimental comparison of min-cut/max-flow algorithms for energy minimization in vision [J].

Boykov, Y ;

Kolmogorov, V .

IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2004, 26 (09) :1124-1137

[6]

Chen ZH, 2011, IEEE IMAGE PROC

[7]

Deng J, 2009, PROC CVPR IEEE, P248, DOI 10.1109/CVPRW.2009.5206848

[8] Context-Aware Saliency Detection [J].

Goferman, Stas ;

Zelnik-Manor, Lihi ;

Tal, Ayellet .

2010 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2010, :2376-2383

[9]

Harchaoui Z., 2007, Computer Vision and Pattern Recognition (CVPR), 2007 IEEE Conference on, P1

[10]

Harel J., 2006, Graph-Based Visual Saliency, V19, DOI DOI 10.7551/MITPRESS/7503.003.0073

← 1 2 3 →