Fast and robust clustering of general-shaped structures with tk-merge

被引：1

作者：

Insolia, Luca ^{[1
]}

Perrotta, Domenico ^{[2
]}

机构：

[1] Univ Geneva, Geneva Sch Econ & Management, Blvd Pont Arve 40, CH-1211 Geneva, Switzerland

[2] Joint Res Ctr JRC, European Commiss, Via Enr Fermi 2479, I-21027 Ispra, Italy

来源：

INTERNATIONAL JOURNAL OF APPROXIMATE REASONING | 2024年 / 168卷

关键词：

Robust statistics; Model-based clustering; Hierarchical clustering; TRIMMING APPROACH; R PACKAGE; K-MEANS; REGRESSION; NUMBER; OUTLIERS;

D O I：

10.1016/j.ijar.2024.109152

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

In real -world applications, the group of provenance of data can be inherently uncertain, the data values can be imprecise and some of them can be wrong. We handle uncertain, imprecise and noisy data in clustering problems with general -shaped structures. We do it under very weak parametric assumptions with a two-step hybrid robust clustering algorithm based on trimmed k -means and hierarchical agglomeration. The algorithm has low computational complexity and effectively identifies the clusters also in presence of data contamination. We also present natural generalizations of the approach as well as an adaptive procedure to estimate the amount of contamination in a data -driven fashion. Our proposal outperforms state-of-the-art robust, model -based methods in our numerical simulations and real -world applications related to color quantization for image analysis, human mobility patterns based on GPS data, biomedical images of diabetic retinopathy, and weather data.

引用

页数：17

共 3 条

[1] Tk-Merge: Computationally Efficient Robust Clustering Under General Assumptions
Insolia, Luca
Perrotta, Domenico
BUILDING BRIDGES BETWEEN SOFT AND STATISTICAL METHODOLOGIES FOR DATA SCIENCE, 2023, 1433 : 216 - 223
[2] Merging K-means with hierarchical clustering for identifying general-shaped groups
Peterson, Anna D.
Ghosh, Arka R.
Maitra, Ranjan
STAT, 2018, 7 (01):
[3] Fast and robust general purpose clustering algorithms
Estivill-Castro, V
Yang, J
DATA MINING AND KNOWLEDGE DISCOVERY, 2004, 8 (02) : 127 - 150

← 1 →