Enhancing continuous time series modelling with a latent ODE-LSTM approach

被引:7
作者
Coelho, C. [1 ]
Costa, M. Fernanda P. [1 ]
Ferras, L. L. [1 ,2 ]
机构
[1] Univ Minho, Ctr Math CMAT, P-4710057 Braga, Portugal
[2] Univ Porto, Dept Mech Engn, Sect Math, Porto, Portugal
关键词
Machine learning; Neural ODE; Latent ODE; RNN; LSTM; Latent ODE-LSTM; Gradient clipping;
D O I
10.1016/j.amc.2024.128727
中图分类号
O29 [应用数学];
学科分类号
070104 ;
摘要
Due to their dynamic properties such as irregular sampling rate and high -frequency sampling, Continuous Time Series (CTS) are found in many applications. Since CTS with irregular sampling rate are difficult to model with standard Recurrent Neural Networks (RNNs), RNNs have been generalised to have continuous -time hidden dynamics defined by a Neural Ordinary Differential Equation (Neural ODE), leading to the ODE-RNN model. Another approach that provides a better modelling is that of the Latent ODE model, which constructs a continuous -time model where a latent state is defined at all times. The Latent ODE model uses a standard RNN as the encoder and a Neural ODE as the decoder. However, since the RNN encoder leads to difficulties with missing data and ill-defined latent variables, a Latent ODE-RNN model has recently been proposed that uses a ODE-RNN model as the encoder instead. Both the Latent ODE and Latent ODE-RNN models are difficult to train due to the vanishing and exploding gradients problem. To overcome this problem, the main contribution of this paper is to propose and illustrate a new model based on a new Latent ODE using an ODE-LSTM (Long Short -Term Memory) network as an encoder - the Latent ODE-LSTM model. To limit the growth of the gradients, the Norm Gradient Clipping strategy was embedded on the Latent ODE-LSTM model. The performance evaluation of the new Latent ODE-LSTM (with and without Norm Gradient Clipping) for modelling CTS with regular and irregular sampling rates is then demonstrated. Numerical experiments show that the new Latent ODE-LSTM performs better than Latent ODERNNs and can avoid the vanishing and exploding gradients during training. Code implementations developed in this work are available at github .com /CeciliaCoelho / LatentODELSTM.
引用
收藏
页数:22
相关论文
共 16 条
[1]   LEARNING LONG-TERM DEPENDENCIES WITH GRADIENT DESCENT IS DIFFICULT [J].
BENGIO, Y ;
SIMARD, P ;
FRASCONI, P .
IEEE TRANSACTIONS ON NEURAL NETWORKS, 1994, 5 (02) :157-166
[2]   FINDING STRUCTURE IN TIME [J].
ELMAN, JL .
COGNITIVE SCIENCE, 1990, 14 (02) :179-211
[3]  
Goodfellow I, 2016, ADAPT COMPUT MACH LE, P1
[4]  
Graves A, 2012, STUD COMPUT INTELL, V385, P1, DOI [10.1007/978-3-642-24797-2, 10.1162/neco.1997.9.1.1]
[5]  
Kingma D. P., 2014, Auto-encoding variational bayes
[6]   An Introduction to Variational Autoencoders [J].
Kingma, Diederik P. ;
Welling, Max .
FOUNDATIONS AND TRENDS IN MACHINE LEARNING, 2019, 12 (04) :4-89
[7]  
Lawrence R, 1997, Using neural networks to forecast stock market prices, V333, P2006
[8]  
Lechner M., 2020, Learning long-term dependencies in irregularlysampled time series
[9]  
Pascanu R., 2013, INT C MACH LEARN, V28, P1310, DOI DOI 10.5555/3042817.3043083
[10]  
Pontryagin L. S., 2018, MATH THEORY OPTIMAL