Learning Intermediate-Level Representations of Form and Motion from Natural Movies

被引：56

作者：

Cadieu, Charles F. ^{[1
]}

Olshausen, Bruno A.

机构：

[1] Univ Calif Berkeley, Helen Wills Neurosci Inst, Redwood Ctr Theoret Neurosci, Berkeley, CA 94720 USA

来源：

NEURAL COMPUTATION | 2012年 / 24卷 / 04期

基金：

美国国家科学基金会;

关键词：

SLOW FEATURE ANALYSIS; OBJECT RECOGNITION; SIMPLE CELLS; INDEPENDENT COMPONENTS; MODEL; PHASE; INVARIANCE; EMERGENCE; IMAGES; CORTEX;

D O I：

10.1162/NECO_a_00247

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

We present a model of intermediate-level visual representation that is based on learning invariances from movies of the natural environment. The model is composed of two stages of processing: an early feature representation layer and a second layer in which invariances are explicitly represented. Invariances are learned as the result of factoring apart the temporally stable and dynamic components embedded in the early feature representation. The structure contained in these components is made explicit in the activities of second-layer units that capture invariances in both form and motion. When trained on natural movies, the first layer produces a factorization, or separation, of image content into a temporally persistent part representing local edge structure and a dynamic part representing local motion structure, consistent with known response properties in early visual cortex (area V1). This factorization linearizes statistical dependencies among the first-layer units, making them learnable by the second layer. The second-layer units are split into two populations according to the factorization in the first layer. The form-selective units receive their input from the temporally persistent part (local edge structure) and after training result in a diverse set of higher-order shape features consisting of extended contours, multiscale edges, textures, and texture boundaries. The motion-selective units receive their input from the dynamic part (local motion structure) and after training result in a representation of image translation over different spatial scales and directions, in addition to more complex deformations. These representations provide a rich description of dynamic natural images and testable hypotheses regarding intermediate-level representation in visual cortex.

引用

页码：827 / 866

页数：40

共 76 条

[1] SPATIOTEMPORAL ENERGY MODELS FOR THE PERCEPTION OF MOTION [J].

ADELSON, EH ;

BERGEN, JR .

JOURNAL OF THE OPTICAL SOCIETY OF AMERICA A-OPTICS IMAGE SCIENCE AND VISION, 1985, 2 (02) :284-299

[2] MOTION SELECTIVITY AND THE CONTRAST-RESPONSE FUNCTION OF SIMPLE CELLS IN THE VISUAL-CORTEX [J].

ALBRECHT, DG ;

GEISLER, WS .

VISUAL NEUROSCIENCE, 1991, 7 (06) :531-546

[3]

[Anonymous], ARXIV10114058V1

[4]

[Anonymous], ADV NEURAL INFORM PR

[5]

[Anonymous], 1991, American PsycNet, DOI DOI 10.7551/MITPRESS/2002.001.0001

[6]

[Anonymous], P SPIE

[7]

[Anonymous], 1978, COMPUTER VISION SYST

[8]

[Anonymous], P COMP VIS PATT REC

[9]

[Anonymous], 1993, THESIS MIT

[10]

[Anonymous], ROBABILISTICMODELS P

← 1 2 3 4 5 6 7 8 →