LQF: Linear Quadratic Fine-Tuning

被引：12

作者：

Achille, Alessandro ^{[1
]}

Golatkar, Aditya ^{[1
,2
]}

Ravichandran, Avinash ^{[1
]}

Polito, Marzia ^{[1
]}

Soatto, Stefano ^{[1
]}

机构：

[1] Amazon Web Serv, Seattle, WA 98109 USA

[2] Univ Calif Los Angeles, Los Angeles, CA 90024 USA

来源：

2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021 | 2021年

关键词：

D O I：

10.1109/CVPR46437.2021.01547

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Classifiers that are linear in their parameters, and trained by optimizing a convex loss function, have predictable behavior with respect to changes in the training data, initial conditions, and optimization. Such desirable properties are absent in deep neural networks (DNNs), typically trained by non-linear fine-tuning of a pre-trained model. Previous attempts to linearize DNNs have led to interesting theoretical insights, but have not impacted the practice due to the substantial performance gap compared to standard non-linear optimization. We present the first method for linearizing a pre-trained model that achieves comparable performance to non-linear fine-tuning on most of real-world image classification tasks tested, thus enjoying the interpretability of linear models without incurring punishing losses in performance. LQF consists of simple modifications to the architecture, loss function and optimization typically used for classification: Leaky-ReLU instead of ReLU, mean squared loss instead of cross-entropy, and pre-conditioning using Kronecker factorization. None of these changes in isolation is sufficient to approach the performance of non-linear fine-tuning. When used in combination, they allow us to reach comparable performance, and even superior in the low-data regime, while enjoying the simplicity, robustness and interpretability of linear-quadratic optimization.

引用

页码：15724 / 15734

页数：11

共 47 条

[1] TASK2VEC: Task Embedding for Meta-Learning [J].

Achille, Alessandro ;

Lam, Michael ;

Tewari, Rahul ;

Ravichandran, Avinash ;

Maji, Subhransu ;

Fowlkes, Charless ;

Soatto, Stefano ;

Perona, Pietro .

2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2019), 2019, :6439-6448

[2]

Bai Yu, 2020, ARXIV200204010

[3]

Barz B, 2020, IEEE WINT CONF APPL, P1360, DOI 10.1109/WACV45572.2020.9093286

[4]

Basu Samyadeep, 2021, INT C LEARN REPR

[5] WHY LEAST-SQUARES AND MAXIMUM-ENTROPY - AN AXIOMATIC APPROACH TO INFERENCE FOR LINEAR INVERSE PROBLEMS [J].

CSISZAR, I .

ANNALS OF STATISTICS, 1991, 19 (04) :2032-2066

[6] Large Scale Fine-Grained Categorization and Domain-Specific Transfer Learning [J].

Cui, Yin ;

Song, Yang ;

Sun, Chen ;

Howard, Andrew ;

Belongie, Serge .

2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2018, :4109-4118

[7] Differential privacy: A survey of results [J].

Dwork, Cynthia .

THEORY AND APPLICATIONS OF MODELS OF COMPUTATION, PROCEEDINGS, 2008, 4978 :1-19

[8]

George T, 2018, ADV NEUR IN, V31

[9]

Giordano R, 2019, PR MACH LEARN RES, V89

[10] Forgetting Outside the Box: Scrubbing Deep Networks of Information Accessible from Input-Output Observations [J].

Golatkar, Aditya ;

Achille, Alessandro ;

Soatto, Stefano .

COMPUTER VISION - ECCV 2020, PT XXIX, 2020, 12374 :383-398

← 1 2 3 4 5 →