Finding Principal Paths in Data Space

被引:18
作者
Ferrarotti, Marco Jacopo [1 ]
Rocchia, Walter [1 ]
Decherchi, Sergio [2 ]
机构
[1] Fdn Ist Italiano Tecnol, Computat mOdelling NanosCalE & bioPhys sysT Lab, I-16163 Genoa, Italy
[2] Fdn Ist Italiano Tecnol, Computat & Chem Biol Grp, I-16163 Genoa, Italy
关键词
Kernel methods; maximal evidence; regularized kernel k-means (RKKM); unsupervised learning; FIT;
D O I
10.1109/TNNLS.2018.2884792
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In this paper, we introduce the concept of principal paths in data space; we show that this is a well-characterized problem from the point of view of cognition, and that it can lead to salient insights in the analyzed data enabling topological/ holistic descriptions. These paths, interestingly, can be interpreted as local principal curves, and in this paper, we suggest that they are analogous to what, in the statistical mechanics realm, are called minimum free-energy paths. Here, we move that concept from physics to data space and compute them in both the original and the kernel space. The algorithm is a regularized version of the well-known k-means clustering algorithm. The regularization parameter is derived via an in-sample model selection process based on the Bayesian evidence maximization. Interestingly, we show that this choice for the regularization parameter consistently leads to the same manifold even when changing the number of clusters. We apply the method to common data sets, dynamical systems, and, in particular, to molecular dynamics trajectories showing the generality, the usefulness of the approach and its superiority with respect to other related approaches.
引用
收藏
页码:2449 / 2462
页数:14
相关论文
共 34 条
[1]  
[Anonymous], 2008, Principal manifolds for data visualization and dimension reduction, DOI DOI 10.1007/978-3-540-73750-6_5
[2]  
Arthur D., 2007, P 18 ANN ACM SIAM S, V8, P1027, DOI DOI 10.1145/1283383.1283494
[3]  
Bottou L., 1995, Advances in Neural Information Processing Systems 7, P585
[4]   TOPOLOGY AND DATA [J].
Carlsson, Gunnar .
BULLETIN OF THE AMERICAN MATHEMATICAL SOCIETY, 2009, 46 (02) :255-308
[5]   The ligand binding mechanism to purine nucleoside phosphorylase elucidated via molecular dynamics and machine learning [J].
Decherchi, Sergio ;
Berteotti, Anna ;
Bottegoni, Giovanni ;
Rocchia, Walter ;
Cavalli, Andrea .
NATURE COMMUNICATIONS, 2015, 6
[6]   Learning the mean: A neural network approach [J].
Decherchi, Sergio ;
Parodi, Mauro ;
Ridella, Sandro .
NEUROCOMPUTING, 2012, 77 (01) :129-143
[7]   Using Unsupervised Analysis to Constrain Generalization Bounds for Support Vector Classifiers [J].
Decherchi, Sergio ;
Ridella, Sandro ;
Zunino, Rodolfo ;
Gastaldo, Paolo ;
Anguita, Davide .
IEEE TRANSACTIONS ON NEURAL NETWORKS, 2010, 21 (03) :424-438
[8]  
Dijkstra E. W., 1959, NUMER MATH, V1, P269, DOI [10.1007/BF01386390, DOI 10.1007/BF01386390]
[9]  
Durbin J., 1960, ECONOMETRICA, V28, P233
[10]   A METHOD FOR DETERMINING REACTION PATHS IN LARGE MOLECULES - APPLICATION TO MYOGLOBIN [J].
ELBER, R ;
KARPLUS, M .
CHEMICAL PHYSICS LETTERS, 1987, 139 (05) :375-380