Aggregated Wasserstein Distance and State Registration for Hidden Markov Models

被引:22
作者
Chen, Yukun [1 ]
Ye, Jianbo [1 ]
Li, Jia [2 ]
机构
[1] Penn State Univ, Coll Informat Sci & Technol, State Coll, PA 16801 USA
[2] Penn State Univ, Dept Stat, State Coll, PA 16801 USA
基金
美国国家科学基金会;
关键词
Hidden Markov models; Monte Carlo methods; Gaussian distribution; Measurement; Computational modeling; Approximation methods; Markov processes; Hidden Markov model; Gaussian mixture model; Wasserstein distance; optimal transport; KULLBACK-LEIBLER DIVERGENCE; HMM; RECOGNITION; TRANSPORT;
D O I
10.1109/TPAMI.2019.2908635
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
We propose a framework, named Aggregated Wasserstein, for computing a dissimilarity measure or distance between two Hidden Markov Models with state conditional distributions being Gaussian. For such HMMs, the marginal distribution at any time position follows a Gaussian mixture distribution, a fact exploited to softly match, aka register, the states in two HMMs. We refer to such HMMs as HMM. The registration of states is inspired by the intrinsic relationship of optimal transport and the Wasserstein metric between distributions. Specifically, the components of the marginal GMMs are matched by solving an optimal transport problem where the cost between components is the Wasserstein metric for Gaussian distributions. The solution of the optimization problem is a fast approximation to the Wasserstein metric between two GMMs. The new Aggregated Wasserstein distance is a semi-metric and can be computed without generating Monte Carlo samples. It is invariant to relabeling or permutation of states. The distance is defined meaningfully even for two HMMs that are estimated from data of different dimensionality, a situation that can arise due to missing variables. This distance quantifies the dissimilarity of HMMs by measuring both the difference between the two marginal GMMs and that between the two transition matrices. Our new distance is tested on tasks of retrieval, classification, and t-SNE visualization of time series. Experiments on both synthetic and real data have demonstrated its advantages in terms of accuracy as well as efficiency in comparison with existing distances based on the Kullback-Leibler divergence.
引用
收藏
页码:2133 / 2147
页数:15
相关论文
共 56 条
[1]  
Akama Y., 2011, ARXIV11094347
[2]  
Alon J, 2003, PROC CVPR IEEE, P375
[3]  
Altschuler J., 2017, P ADV NEUR INF PROC, P1964
[4]  
[Anonymous], 2012, IEEE INT SYMP CIRC S
[5]  
[Anonymous], 2015, JMLR WORKSH CONF PRO
[6]  
[Anonymous], 2007, LECT NOTES ARTIF INT
[7]  
[Anonymous], 2014, JMLR WORKSH CONF PRO
[8]  
[Anonymous], 2016, LECT NOTES COMPUT SC, DOI DOI 10.1007/978-3-319-46466-4_27
[9]  
[Anonymous], 2017, IEEE SIGNAL PROC MAG, DOI DOI 10.1109/MSP.2017.2695801
[10]  
Applegate D., 2011, P 17 ACM SIGKDD INT, P636