Online Deterministic Annealing for Classification and Clustering

被引:8
作者
Mavridis, Christos N. [1 ,2 ]
Baras, John S. [1 ,2 ]
机构
[1] Univ Maryland, Dept Elect & Comp Engn, College Pk, MD 20742 USA
[2] Univ Maryland, Inst Syst Res, College Pk, MD 20742 USA
关键词
Optimization; Vector quantization; Distortion measurement; Stochastic processes; Machine learning algorithms; Approximation algorithms; Data models; Annealing optimization; Bregman divergences; classification; clustering; machine learning algorithms; progressive learning; OPTIMIZATION; ROBUSTNESS;
D O I
10.1109/TNNLS.2021.3138676
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Inherent in virtually every iterative machine learning algorithm is the problem of hyperparameter tuning, which includes three major design parameters: 1) the complexity of the model, e.g., the number of neurons in a neural network; 2) the initial conditions, which heavily affect the behavior of the algorithm; and 3) the dissimilarity measure used to quantify its performance. We introduce an online prototype-based learning algorithm that can be viewed as a progressively growing competitive-learning neural network architecture for classification and clustering. The learning rule of the proposed approach is formulated as an online gradient-free stochastic approximation algorithm that solves a sequence of appropriately defined optimization problems, simulating an annealing process. The annealing nature of the algorithm contributes to avoiding poor local minima, offers robustness with respect to the initial conditions, and provides a means to progressively increase the complexity of the learning model, through an intuitive bifurcation phenomenon. The proposed approach is interpretable, requires minimal hyperparameter tuning, and allows online control over the performance-complexity tradeoff. Finally, we show that Bregman divergences appear naturally as a family of dissimilarity measures that play a central role in both the performance and the computational complexity of the learning algorithm.
引用
收藏
页码:7125 / 7134
页数:10
相关论文
共 40 条
  • [1] [Anonymous], 2012, Nonlinear multiobjective optimization
  • [2] [Anonymous], 2013, A probabilistic theory of pattern recognition
  • [3] Babiker H. K. B., 2017, ARXIV171106431
  • [4] Banerjee A, 2005, J MACH LEARN RES, V6, P1705
  • [5] Baras J. S., 1991, P ADV NEUR INF PROC, P1
  • [6] Bennett KP, 2006, J MACH LEARN RES, V7, P1265
  • [7] Prototype-based models in machine learning
    Biehl, Michael
    Hammer, Barbara
    Villmann, Thomas
    [J]. WILEY INTERDISCIPLINARY REVIEWS-COGNITIVE SCIENCE, 2016, 7 (02) : 92 - 111
  • [8] Borkar V. S., 2009, STOCHASTIC APPROXIMA, V48
  • [9] Bottou L., 1995, Advances in Neural Information Processing Systems 7, P585
  • [10] Random forests
    Breiman, L
    [J]. MACHINE LEARNING, 2001, 45 (01) : 5 - 32