Integrating Convolution and Sparse Coding for Learning Low-Dimensional Discriminative Image Representations

被引：1

作者：

Wei, Xian ^{[1
]}

Liu, Yingjie ^{[1
]}

Tang, Xuan ^{[2
]}

Yu, Shui ^{[3
]}

Chen, Mingsong ^{[1
]}

机构：

[1] East China Normal Univ, MoE Engn Res Ctr Hardware Software Codesign Techno, Shanghai 200062, Peoples R China

[2] East China Normal Univ, Sch Commun & Elect Engn, Shanghai 200241, Peoples R China

[3] Univ Technol Sydney, Sch Comp Sci, Sydney, NSW 2007, Australia

来源：

IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS | 2025年 / 36卷 / 07期

基金：

中国国家自然科学基金;

关键词：

Manifolds; Dictionaries; Feature extraction; Convolutional neural networks; Convolution; Image coding; Training; Convolutional neural network (CNN); discriminative representation learning; geometric optimization; manifold; sparse coding; GENERAL FRAMEWORK; K-SVD; DICTIONARY; ALGORITHM; REDUCTION; RECOGNITION;

D O I：

10.1109/TNNLS.2024.3453374

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

This work investigates the problem of efficiently learning discriminative low-dimensional (LD) representations of multiclass image objects. We propose a generic end-to-end approach that jointly optimizes sparse dictionary and convolutions for learning LOW-dimensional discriminative image representations, named SparConvLow, taking advantage of convolutional neural networks (CNNs), dictionary learning, and orthogonal projections. The whole learning process can be summarized as follows. First, a CNN module is employed to extract high-dimensional (HD) preliminary convolutional features. Second, to avoid the high computational cost of direct sparse coding on HD CNN features, we learn sparse representation (SR) over a task-driven dictionary in the space with the feature being orthogonally projected. We then exploit the discriminative projection on SR. The whole learning process is consistently treated as an end-to-end joint optimization problem of trace quotient maximization. The cost function is well-defined on the product of the CNN parameters space, the Stiefel manifold, the Oblique manifold, and the Grassmann manifold. By using the explicit gradient delivery, the cost function is optimized via a geometrical stochastic gradient descent (SGD) algorithm along with the chain rule and the backpropagation. The experimental results show that the proposed method can achieve a highly competitive performance with the state-of-the-art (SOTA) image classification, object categorization, and face recognition methods, under both supervised and semi-supervised settings. The code is available at https://github.com/MVPR-Group/SparConvLow.

引用

页码：12483 / 12496

页数：14

共 70 条

[1] Visual tracking using convolutional features with sparse coding [J].

Abbass, Mohammed Y. ;

Kwon, Ki-Chul ;

Kim, Nam ;

Abdelwahab, Safey A. ;

Abd El-Samie, Fathi E. ;

Khalaf, Ashraf A. M. .

ARTIFICIAL INTELLIGENCE REVIEW, 2021, 54 (05) :3349-3360

[2]

Absil PA, 2008, OPTIMIZATION ALGORITHMS ON MATRIX MANIFOLDS, P1

[3] K-SVD: An algorithm for designing overcomplete dictionaries for sparse representation [J].

Aharon, Michal ;

Elad, Michael ;

Bruckstein, Alfred .

IEEE TRANSACTIONS ON SIGNAL PROCESSING, 2006, 54 (11) :4311-4322

[4] Eigenfaces vs. Fisherfaces: Recognition using class specific linear projection [J].

Belhumeur, PN ;

Hespanha, JP ;

Kriegman, DJ .

IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 1997, 19 (07) :711-720

[5]

Belkin M, 2002, ADV NEUR IN, V14, P585

[6] Representation Learning: A Review and New Perspectives [J].

Bengio, Yoshua ;

Courville, Aaron ;

Vincent, Pascal .

IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2013, 35 (08) :1798-1828

[7]

Berthelot D, 2019, ADV NEUR IN, V32

[8] Geometric Deep Learning Going beyond Euclidean data [J].

Bronstein, Michael M. ;

Bruna, Joan ;

LeCun, Yann ;

Szlam, Arthur ;

Vandergheynst, Pierre .

IEEE SIGNAL PROCESSING MAGAZINE, 2017, 34 (04) :18-42

[9]

Cai D, 2007, IEEE I CONF COMP VIS, P222

[10]

Cao Z., 2019, arXiv

← 1 2 3 4 5 6 7 →