High-Dimensional Data Analysis Using Parameter Free Algorithm Data Point Positioning Analysis

被引：0

作者：

Mustapha, S. M. F. D. Syed ^{[1
]}

机构：

[1] Zayed Univ, Coll Technol Innovat, POB 19282, Dubai, U Arab Emirates

来源：

APPLIED SCIENCES-BASEL | 2024年 / 14卷 / 10期

关键词：

clustering; parameter-free algorithm; unsupervised learning; data mining; DPPA; MEAN SHIFT; CATEGORICAL-DATA; IDENTIFICATION; VALIDATION; MODELS;

D O I：

10.3390/app14104231

中图分类号：

O6 [化学];

学科分类号：

0703 ;

摘要：

Clustering is an effective statistical data analysis technique; it has several applications, including data mining, pattern recognition, image analysis, bioinformatics, and machine learning. Clustering helps to partition data into groups of objects with distinct characteristics. Most of the methods for clustering use manually selected parameters to find the clusters from the dataset. Consequently, it can be very challenging and time-consuming to extract the optimal parameters for clustering a dataset. Moreover, some clustering methods are inadequate for locating clusters in high-dimensional data. To address these concerns systematically, this paper introduces a novel selection-free clustering technique named data point positioning analysis (DPPA). The proposed method is straightforward since it calculates 1-NN and Max-NN by analyzing the data point placements without the requirement of an initial manual parameter assignment. This method is validated using two well-known publicly available datasets used in several clustering algorithms. To compare the performance of the proposed method, this study also investigated four popular clustering algorithms (DBSCAN, affinity propagation, Mean Shift, and K-means), where the proposed method provides higher performance in finding the cluster without using any manually selected parameters. The experimental finding demonstrated that the proposed DPPA algorithm is less time-consuming compared to the existing traditional methods and achieves higher performance without using any manually selected parameters.

引用

页数：20

共 62 条

[41]

Parsons L., 2004, ACMSIGKDD Explorations Newslett., V6, P90, DOI [DOI 10.1145/1007730.1007731, 10.1145/1007730.1007731]

[42]

Pisharath J., 2010, NU-MineBench 3.0

[43]

Rokach L, 2005, DATA MINING AND KNOWLEDGE DISCOVERY HANDBOOK, P321, DOI 10.1007/0-387-25465-X_15

[44] SILHOUETTES - A GRAPHICAL AID TO THE INTERPRETATION AND VALIDATION OF CLUSTER-ANALYSIS [J].

ROUSSEEUW, PJ .

JOURNAL OF COMPUTATIONAL AND APPLIED MATHEMATICS, 1987, 20 :53-65

[45] A data mining approach for improved interpretation of ERT inverted sections using the DBSCAN clustering algorithm [J].

Sabor, Kawtar ;

Jougnot, Damien ;

Guerin, Roger ;

Steck, Barthelemy ;

Henault, Jean-Marie ;

Apffel, Louis ;

Vautrin, Denis .

GEOPHYSICAL JOURNAL INTERNATIONAL, 2021, 225 (02) :1304-1318

[46] Density-based clustering in spatial databases: The algorithm GDBSCAN and its applications [J].

Sander, J ;

Ester, M ;

Kriegel, HP ;

Xu, XW .

DATA MINING AND KNOWLEDGE DISCOVERY, 1998, 2 (02) :169-194

[47]

Silverman B.W., 1986, Density Estimation for Statistics and Data Analysis, DOI [10.1201/9781315140919, DOI 10.1201/9781315140919]

[48] Identification of time-varying OE models in presence of non-Gaussian noise: Application to pneumatic servo drives [J].

Stojanovic, Vladimir ;

Nedic, Novak .

INTERNATIONAL JOURNAL OF ROBUST AND NONLINEAR CONTROL, 2016, 26 (18) :3974-3995

[49] Optimal experiment design for identification of ARX models with constrained output in non-Gaussian noise [J].

Stojanovic, Vladimir ;

Nedic, Novak ;

Prsic, Dragan ;

Dubonjic, Ljubisa .

APPLIED MATHEMATICAL MODELLING, 2016, 40 (13-14) :6676-6689

[50]

Syed Mustapha S. M. F. D., 2021, ICIC Express Letters, Part B: Applications, V12, P317, DOI 10.24507/icicelb.12.04.317

← 1 2 3 4 5 6 7 →