Optimal stochastic gradient descent algorithm for filtering

被引:1
作者
Turali, M. Yigit [1 ]
Koc, Ali T. [1 ]
Kozat, Suleyman S. [1 ]
机构
[1] Bilkent Univ, Dept Elect & Elect Engn, TR-06800 Ankara, Turkiye
关键词
Learning rate; Linear filtering; Optimization; Stochastic gradient descent; PREDICTION;
D O I
10.1016/j.dsp.2024.104731
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
Stochastic Gradient Descent (SGD) is a fundamental optimization technique in machine learning, due to its efficiency in handling large-scale data. Unlike typical SGD applications, which rely on stochastic approximations, this work explores the convergence properties of SGD from a deterministic perspective. We address the crucial aspect of learning rate settings, a common obstacle in optimizing SGD performance, particularly in complex environments. In contrast to traditional methods that often provide convergence results based on statistical expectations (which are usually not justified), our approach introduces universally applicable learning rates. These rates ensure that a model trained with SGD matches the performance of the best linear filter asymptotically, applicable irrespective of the data sequence length and independent of statistical assumptions about the data. By establishing learning rates that scale as mu = O(1/t), we offer a solution that sidesteps the need for prior data knowledge, a prevalent limitation in real-world applications. To this end, we provide a robust framework for SGD's application across varied settings, guaranteeing convergence results that hold under both deterministic and stochastic scenarios without any underlying assumptions.
引用
收藏
页数:6
相关论文
共 29 条
  • [1] A hybrid framework for sequential data prediction with end-to-end optimization
    Aydin, Mustafa E.
    Kozat, Suleyman S.
    [J]. DIGITAL SIGNAL PROCESSING, 2022, 129
  • [2] The diffusion least mean square algorithm with variable q-gradient
    Cai, Peng
    Wang, Shiyuan
    Qian, Junhui
    Zhang, Tao
    Huang, Gangyi
    [J]. DIGITAL SIGNAL PROCESSING, 2022, 127
  • [3] Worst-case quadratic loss bounds for prediction using linear functions and gradient descent
    CesaBianchi, N
    Long, PM
    Warmuth, MK
    [J]. IEEE TRANSACTIONS ON NEURAL NETWORKS, 1996, 7 (03): : 604 - 619
  • [4] CMA adaptive equalization in subspace pre-whitened blind receivers
    Chang, Wei-Chieh
    Yuan, Jenq-Tay
    [J]. DIGITAL SIGNAL PROCESSING, 2019, 88 : 33 - 40
  • [5] Chee J, 2018, PR MACH LEARN RES, V84
  • [6] Control learning rate for autism facial detection via deep transfer learning
    El Mouatasim, Abdelkrim
    Ikermane, Mohamed
    [J]. SIGNAL IMAGE AND VIDEO PROCESSING, 2023, 17 (07) : 3713 - 3720
  • [7] H infinity optimality of the LMS algorithm
    Hassibi, B
    Sayed, AH
    Kailath, T
    [J]. IEEE TRANSACTIONS ON SIGNAL PROCESSING, 1996, 44 (02) : 267 - 280
  • [8] A mini-batch algorithm for large-scale learning problems with adaptive step size
    He, Chongyang
    Zhang, Yiting
    Zhu, Dingyu
    Cao, Mingyuan
    Yang, Yueting
    [J]. DIGITAL SIGNAL PROCESSING, 2023, 143
  • [9] Optimization of electric vehicle sound package based on LSTM with an adaptive learning rate forest and multiple-level multiple-object method
    Huang, Haibo
    Huang, Xiaorong
    Ding, Weiping
    Zhang, Siwen
    Pang, Jian
    [J]. MECHANICAL SYSTEMS AND SIGNAL PROCESSING, 2023, 187
  • [10] AdaLip: An Adaptive Learning Rate Method per Layer for Stochastic Optimization
    Ioannou, George
    Tagaris, Thanos
    Stafylopatis, Andreas
    [J]. NEURAL PROCESSING LETTERS, 2023, 55 (05) : 6311 - 6338