On clustering uncertain and structured data with Wasserstein barycenters and a geodesic criterion for the number of clusters

被引:3
作者
Papayiannis, G. I. [1 ,3 ]
Domazakis, G. N. [1 ,4 ]
Drivaliaris, D. [5 ]
Koukoulas, S. [6 ]
Tsekrekos, A. E. [2 ]
Yannacopoulos, A. N. [1 ]
机构
[1] Athens Univ Econ & Business, Stochast Modeling & Applicat Lab, Dept Stat, Athens, Greece
[2] Athens Univ Econ & Business, Dept Accounting & Finance, Athens, Greece
[3] Hellen Naval Acad, Math Modeling & Applicat Lab, Sect Math, Piraeus, Greece
[4] Univ Sussex, Dept Math, Brighton, E Sussex, England
[5] Univ Aegean, Dept Financial & Management Engn, Chios, Greece
[6] Univ Aegean, Dept Geog, Mitilini, Greece
关键词
Clustering; geodesics; K-means; structured data; uncertain data; Wasserstein barycenter; TRANSPORTATION;
D O I
10.1080/00949655.2021.1903463
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
Clustering schemes for uncertain and structured data are considered relying on the notion of Wasserstein barycenters, accompanied by appropriate clustering indices based on the intrinsic geometry of the Wasserstein space. Such type of clustering approaches are highly appreciated in many fields where the observational/experimental error is significant or the data nature is more complex and the traditional learning algorithms are not applicable or effective to treat. Under this perspective, each observation is identified by an appropriate probability measure and the proposed clustering schemes rely on discrimination criteria that utilize the geometric structure of the space of probability measures through core techniques from the optimal transport theory. The advantages and capabilities of the proposed approach and the geodesic criterion performance are illustrated through a simulation study and the implementation in two different applications: (a) clustering eurozone countries' bond yield curves and (b) classifying satellite images to certain land uses categories.
引用
收藏
页码:2569 / 2594
页数:26
相关论文
共 40 条
[1]  
Aggarwal CC, 2001, LECT NOTES COMPUT SC, V1973, P420
[2]   BARYCENTERS IN THE WASSERSTEIN SPACE [J].
Agueh, Martial ;
Carlier, Guillaume .
SIAM JOURNAL ON MATHEMATICAL ANALYSIS, 2011, 43 (02) :904-924
[3]   A fixed-point approach to barycenters in Wasserstein space [J].
Alvarez-Esteban, Pedro C. ;
del Barrio, E. ;
Cuesta-Albertos, J. A. ;
Matran, C. .
JOURNAL OF MATHEMATICAL ANALYSIS AND APPLICATIONS, 2016, 441 (02) :744-762
[4]  
[Anonymous], 2010, Managing and mining uncertain data
[5]   ITERATIVE BREGMAN PROJECTIONS FOR REGULARIZED TRANSPORTATION PROBLEMS [J].
Benamou, Jean-David ;
Carlier, Guillaume ;
Cuturi, Marco ;
Nenna, Luca ;
Peyre, Gabriel .
SIAM JOURNAL ON SCIENTIFIC COMPUTING, 2015, 37 (02) :A1111-A1138
[6]   On the Bures-Wasserstein distance between positive definite matrices [J].
Bhatia, Rajendra ;
Jain, Tanvi ;
Lim, Yongdo .
EXPOSITIONES MATHEMATICAE, 2019, 37 (02) :165-191
[7]   Distribution's template estimate with Wasserstein metrics [J].
Boissard, Emmanuel ;
Le Gouic, Thibaut ;
Loubes, Jean-Michel .
BERNOULLI, 2015, 21 (02) :740-759
[8]   Sliced and Radon Wasserstein Barycenters of Measures [J].
Bonneel, Nicolas ;
Rabin, Julien ;
Peyre, Gabriel ;
Pfister, Hanspeter .
JOURNAL OF MATHEMATICAL IMAGING AND VISION, 2015, 51 (01) :22-45
[9]   High-dimensional data clustering [J].
Bouveyron, C. ;
Girard, S. ;
Schmid, C. .
COMPUTATIONAL STATISTICS & DATA ANALYSIS, 2007, 52 (01) :502-519
[10]   NUMERICAL METHODS FOR MATCHING FOR TEAMS AND WASSERSTEIN BARYCENTERS [J].
Carlier, Guillaume ;
Oberman, Adam ;
Oudet, Edouard .
ESAIM-MATHEMATICAL MODELLING AND NUMERICAL ANALYSIS-MODELISATION MATHEMATIQUE ET ANALYSE NUMERIQUE, 2015, 49 (06) :1621-1642