A Study on Truncated Newton Methods for Linear Classification

被引:16
作者
Galli, Leonardo [1 ]
Lin, Chih-Jen [2 ]
机构
[1] Univ Florence, Dept Informat Engn, I-50139 Florence, Italy
[2] Natl Taiwan Univ, Dept Comp Sci, Taipei 106, Taiwan
关键词
Convergence; Linear systems; Approximation algorithms; Mathematical model; Logistics; Support vector machines; Software; Conjugate gradient (CG); linear classification; preconditioning; truncated Newton (TN); truncation criteria; FORCING TERMS; CHOICE;
D O I
10.1109/TNNLS.2020.3045836
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Truncated Newton (TN) methods have been a useful technique for large-scale optimization. Instead of obtaining the full Newton direction, a truncated method approximately solves the Newton equation with an inner conjugate gradient (CG) procedure (TNCG for the whole method). These methods have been employed to efficiently solve linear classification problems. However, even in this deeply studied field, various theoretical and numerical aspects were not completely explored. The first contribution of this work is to comprehensively study the global and local convergence when TNCG is applied to linear classification. Because of the lack of twice differentiability under some losses, many past works cannot be applied here. We prove various missing pieces of theory from scratch and clarify many proper references. The second contribution is to study the termination of the CG method. For the first time when TNCG is applied to linear classification, we show that the inner stopping condition strongly affects the convergence speed. We propose using a quadratic stopping criterion to achieve both robustness and efficiency. The third contribution is that of combining the study on inner stopping criteria with that of preconditioning. We discuss how convergence theory is affected by preconditioning and finally propose an effective preconditioned TNCG.
引用
收藏
页码:2828 / 2841
页数:14
相关论文
共 31 条
[1]   A choice of forcing terms in inexact Newton method [J].
An, Heng-Bin ;
Mo, Ze-Yao ;
Liu, Xing-Ping .
JOURNAL OF COMPUTATIONAL AND APPLIED MATHEMATICS, 2007, 200 (01) :47-60
[2]   A choice of forcing terms in inexact Newton iterations with application to pseudo-transient continuation for incompressible fluid flow computations [J].
Botti, L. .
APPLIED MATHEMATICS AND COMPUTATION, 2015, 266 :713-737
[3]   Warm Start for Parameter Selection of Linear Classifiers [J].
Chu, Bo-Yu ;
Ho, Chia-Hua ;
Tsai, Cheng-Hao ;
Lin, Chieh-Yen ;
Lin, Chih-Jen .
KDD'15: PROCEEDINGS OF THE 21ST ACM SIGKDD INTERNATIONAL CONFERENCE ON KNOWLEDGE DISCOVERY AND DATA MINING, 2015, :149-158
[4]  
Clarke F. H., 1990, Classics in applied mathematics), V5
[5]  
Concus P., 1976, Sparse Matrix Computations
[6]  
Defazio A, 2014, ADV NEUR IN, V27
[7]   INEXACT NEWTON METHODS [J].
DEMBO, RS ;
EISENSTAT, SC ;
STEIHAUG, T .
SIAM JOURNAL ON NUMERICAL ANALYSIS, 1982, 19 (02) :400-408
[8]   Choosing the forcing terms in an inexact Newton method [J].
Eisenstat, SC ;
Walker, HF .
SIAM JOURNAL ON SCIENTIFIC COMPUTING, 1996, 17 (01) :16-32
[9]   MINIMIZATION OF SC1 FUNCTIONS AND THE MARATOS EFFECT [J].
FACCHINEI, F .
OPERATIONS RESEARCH LETTERS, 1995, 17 (03) :131-137
[10]  
Facchinei F, 2003, FINITE DIMENSIONAL V, VII