GEVA: geometric variability-based approaches for identifying patterns in data

被引:10
作者
Irigoien, Itziar [2 ]
Arenas, Concepcion [1 ]
Fernandez, Elena [3 ]
Mestres, Francisco [4 ]
机构
[1] Univ Barcelona, Fac Biol, Dept Estadist, E-08028 Barcelona, Spain
[2] UPV, EHU, Dept Computat & Artificial Intelligence, Donostia San Sebastian, Spain
[3] Univ Politecn Cataluna, Dept Estadist & Invest Operat, Barcelona, Spain
[4] Univ Barcelona, Dept Genet, E-08028 Barcelona, Spain
关键词
Cluster algorithms; Geometric-variability; Divisive algorithm; Agglomerative algorithm; Population studies; CHROMOSOMAL-INVERSION POLYMORPHISM; DROSOPHILA-SUBOBSCURA; MULTIVARIATE-ANALYSIS; DISCRIMINANT-ANALYSIS; DISTANCE; COLONIZATION; POPULATIONS; AMERICA;
D O I
10.1007/s00180-009-0173-9
中图分类号
O21 [概率论与数理统计]; C8 [统计学];
学科分类号
020208 ; 070103 ; 0714 ;
摘要
This paper, arising from population studies, develops clustering algorithms for identifying patterns in data. Based on the concept of geometric variability, we have developed one polythetic-divisive and three agglomerative algorithms. The effectiveness of these procedures is shown by relating them to classical clustering algorithms. They are very general since they do not impose constraints on the type of data, so they are applicable to general (economics, ecological, genetics...) studies. Our major contributions include a rigorous formulation for novel clustering algorithms, and the discovery of new relationship between geometric variability and clustering. Finally, these novel procedures give a theoretical frame with an intuitive interpretation to some classical clustering methods to be applied with any type of data, including mixed data. These approaches are illustrated with real data on Drosophila chromosomal inversions.
引用
收藏
页码:241 / 255
页数:15
相关论文
共 29 条