Optimal stochastic gradient descent algorithm for filtering

被引：1

作者：

Turali, M. Yigit ^{[1
]}

Koc, Ali T. ^{[1
]}

Kozat, Suleyman S. ^{[1
]}

机构：

[1] Bilkent Univ, Dept Elect & Elect Engn, TR-06800 Ankara, Turkiye

来源：

DIGITAL SIGNAL PROCESSING | 2024年 / 155卷

关键词：

Learning rate; Linear filtering; Optimization; Stochastic gradient descent; PREDICTION;

D O I：

10.1016/j.dsp.2024.104731

中图分类号：

TM [电工技术]; TN [电子技术、通信技术];

学科分类号：

0808 ; 0809 ;

摘要：

Stochastic Gradient Descent (SGD) is a fundamental optimization technique in machine learning, due to its efficiency in handling large-scale data. Unlike typical SGD applications, which rely on stochastic approximations, this work explores the convergence properties of SGD from a deterministic perspective. We address the crucial aspect of learning rate settings, a common obstacle in optimizing SGD performance, particularly in complex environments. In contrast to traditional methods that often provide convergence results based on statistical expectations (which are usually not justified), our approach introduces universally applicable learning rates. These rates ensure that a model trained with SGD matches the performance of the best linear filter asymptotically, applicable irrespective of the data sequence length and independent of statistical assumptions about the data. By establishing learning rates that scale as mu = O(1/t), we offer a solution that sidesteps the need for prior data knowledge, a prevalent limitation in real-world applications. To this end, we provide a robust framework for SGD's application across varied settings, guaranteeing convergence results that hold under both deterministic and stochastic scenarios without any underlying assumptions.

引用

页数：6

共 29 条

[1] A hybrid framework for sequential data prediction with end-to-end optimization
Aydin, Mustafa E.
Kozat, Suleyman S.
[J]. DIGITAL SIGNAL PROCESSING, 2022, 129
[2] The diffusion least mean square algorithm with variable q-gradient
Cai, Peng
Wang, Shiyuan
Qian, Junhui
Zhang, Tao
Huang, Gangyi
[J]. DIGITAL SIGNAL PROCESSING, 2022, 127
[3] Worst-case quadratic loss bounds for prediction using linear functions and gradient descent
CesaBianchi, N
Long, PM
Warmuth, MK
[J]. IEEE TRANSACTIONS ON NEURAL NETWORKS, 1996, 7 (03): : 604 - 619
[4] CMA adaptive equalization in subspace pre-whitened blind receivers
Chang, Wei-Chieh
Yuan, Jenq-Tay
[J]. DIGITAL SIGNAL PROCESSING, 2019, 88 : 33 - 40
[5] Chee J, 2018, PR MACH LEARN RES, V84
[6] Control learning rate for autism facial detection via deep transfer learning
El Mouatasim, Abdelkrim
Ikermane, Mohamed
[J]. SIGNAL IMAGE AND VIDEO PROCESSING, 2023, 17 (07) : 3713 - 3720
[7] H infinity optimality of the LMS algorithm
Hassibi, B
Sayed, AH
Kailath, T
[J]. IEEE TRANSACTIONS ON SIGNAL PROCESSING, 1996, 44 (02) : 267 - 280
[8] A mini-batch algorithm for large-scale learning problems with adaptive step size
He, Chongyang
Zhang, Yiting
Zhu, Dingyu
Cao, Mingyuan
Yang, Yueting
[J]. DIGITAL SIGNAL PROCESSING, 2023, 143
[9] Optimization of electric vehicle sound package based on LSTM with an adaptive learning rate forest and multiple-level multiple-object method
Huang, Haibo
Huang, Xiaorong
Ding, Weiping
Zhang, Siwen
Pang, Jian
[J]. MECHANICAL SYSTEMS AND SIGNAL PROCESSING, 2023, 187
[10] AdaLip: An Adaptive Learning Rate Method per Layer for Stochastic Optimization
Ioannou, George
Tagaris, Thanos
Stafylopatis, Andreas
[J]. NEURAL PROCESSING LETTERS, 2023, 55 (05) : 6311 - 6338

← 1 2 3 →