Multi-Objective Matrix Normalization for Fine-Grained Visual Recognition

被引:63
作者
Min, Shaobo [1 ]
Yao, Hantao [2 ]
Xie, Hongtao [1 ]
Zha, Zheng-Jun [1 ]
Zhang, Yongdong [1 ]
机构
[1] Univ Sci & Technol China, Sch Informat Sci & Technol, Hefei 230026, Peoples R China
[2] Chinese Acad Sci, Natl Lab Pattern Recognit, Inst Automat, Beijing 100864, Peoples R China
关键词
Visualization; Graphics processing units; Feature extraction; Convergence; Optimization; Covariance matrices; Training; Fine-grained visual recognition; bilinear pooling; matrix normalization; multi-objective optimization; OPTIMIZATION;
D O I
10.1109/TIP.2020.2977457
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Bilinear pooling achieves great success in fine-grained visual recognition (FGVC). Recent methods have shown that the matrix power normalization can stabilize the second-order information in bilinear features, but some problems, e.g., redundant information and over-fitting, remain to be resolved. In this paper, we propose an efficient Multi-Objective Matrix Normalization (MOMN) method that can simultaneously normalize a bilinear representation in terms of square-root, low-rank, and sparsity. These three regularizers can not only stabilize the second-order information, but also compact the bilinear features and promote model generalization. In MOMN, a core challenge is how to jointly optimize three non-smooth regularizers of different convex properties. To this end, MOMN first formulates them into an augmented Lagrange formula with approximated regularizer constraints. Then, auxiliary variables are introduced to relax different constraints, which allow each regularizer to be solved alternately. Finally, several updating strategies based on gradient descent are designed to obtain consistent convergence and efficient implementation. Consequently, MOMN is implemented with only matrix multiplication, which is well-compatible with GPU acceleration, and the normalized bilinear features are stabilized and discriminative. Experiments on five public benchmarks for FGVC demonstrate that the proposed MOMN is superior to existing normalization-based methods in terms of both accuracy and efficiency. The code is available: https://github.com/mboboGO/MOMN.
引用
收藏
页码:4996 / 5009
页数:14
相关论文
共 77 条
[1]  
[Anonymous], 2012, NONLINEAR MULTIOBJEC, DOI DOI 10.1186/1472-6963-12-201
[2]  
[Anonymous], 2007, 2007 IEEE C COMP VIS, DOI DOI 10.1109/CVPR.2007.383197
[3]  
Bertsekas Dimitri P, 1997, Journal of the Operational Research Society, V48, P334
[4]   Distributed optimization and statistical learning via the alternating direction method of multipliers [J].
Boyd S. ;
Parikh N. ;
Chu E. ;
Peleato B. ;
Eckstein J. .
Foundations and Trends in Machine Learning, 2010, 3 (01) :1-122
[5]   Higher-order Integration of Hierarchical Convolutional Activations for Fine-grained Visual Categorization [J].
Cai, Sijia ;
Zuo, Wangmeng ;
Zhang, Lei .
2017 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2017, :511-520
[6]   Finding frequent items in data streams [J].
Charikar, M ;
Chen, K ;
Farach-Colton, M .
THEORETICAL COMPUTER SCIENCE, 2004, 312 (01) :3-15
[7]   Destruction and Construction Learning for Fine-grained Image Recognition [J].
Chen, Yue ;
Bai, Yalong ;
Zhang, Wei ;
Mei, Tao .
2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019), 2019, :5152-5161
[8]   Kernel Pooling for Convolutional Neural Networks [J].
Cui, Yin ;
Zhou, Feng ;
Wang, Jiang ;
Liu, Xiao ;
Lin, Yuanqing ;
Belongie, Serge .
30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, :3049-3058
[9]   Multiple-gradient descent algorithm (MGDA) for multiobjective optimization [J].
Desideri, Jean-Antoine .
COMPTES RENDUS MATHEMATIQUE, 2012, 350 (5-6) :313-318
[10]   Selective Sparse Sampling for Fine-grained Image Recognition [J].
Ding, Yao ;
Zhou, Yanzhao ;
Zhu, Yi ;
Ye, Qixiang ;
Jiao, Jianbin .
2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2019), 2019, :6598-6607