Proposing a Dimensionality Reduction Technique With an Inequality for Unsupervised Learning from High-Dimensional Big Data

被引：0

作者：

Ismkhan, Hassan ^{[1
]}

Izadi, Mohammad ^{[1
]}

机构：

[1] Sharif Univ Technol, Fac Comp Engn, Tehran 1458889694, Iran

来源：

IEEE TRANSACTIONS ON SYSTEMS MAN CYBERNETICS-SYSTEMS | 2023年 / 53卷 / 06期

关键词：

Clustering algorithms; Task analysis; Feature extraction; Unsupervised learning; Dimensionality reduction; Transforms; Standards; Big data; dimensionality reduction (DR); high-dimensional data; k-means; nearest neighbor (NN); K-MEANS;

D O I：

10.1109/TSMC.2023.3234227

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

task can be considered as the most important unsupervised learning algorithms. For about all clustering algorithms, finding the Nearest Neighbors of a point within a certain radius r (NN -r), is a critical task. For a high-dimensional dataset, this task becomes too time consuming. This article proposes a simple dimensionality reduction (DR) technique. For point p in d-dimensional space, it produces point p' in d'-dimensional space, where d' << d. In addition, for any pair of points p and q, and their maps p' and q' in the target space, it is proved that |p, q| > |p', q'| is preserved, where |, | used to denote the Euclidean distance between a pair of points. This property can speed up finding NN -r. For a certain radius r, and a pair of points p and q, whenever |p', q'| > r, then q can not be in NN -r of p. Using this trick, the task of finding the NN -r is speeded up. Then, as a case study, it is applied to accelerate the k-means, one of the most famous unsupervised learning algorithms, where it can automatically determine the d'. The proposed NN -r method and the accelerated k-means are compared with recent state-of-the-arts, and both yield favorable results.

引用

页码：3880 / 3889

页数：10

共 50 条

[31] On dimensionality reduction of high dimensional data sets
Chizi, B
Shmilovici, A
Maimon, O
INTELLIGENT TECHNOLOGIES - THEORY AND APPLICATIONS: NEW TRENDS IN INTELLIGENT TECHNOLOGIES, 2002, 76 : 233 - 238
[32] Manifold learning: Dimensionality reduction and high dimensional data reconstruction via dictionary learning
Zhao, Zhong
Feng, Guocan
Zhu, Jiehua
Shen, Qi
NEUROCOMPUTING, 2016, 216 : 268 - 285
[33] Learning from label proportions on high-dimensional data
Shi, Yong
Liu, Jiabin
Qi, Zhiquan
Wang, Bo
NEURAL NETWORKS, 2018, 103 : 9 - 18
[34] Fractal-Based Methods as a Technique for Estimating the Intrinsic Dimensionality of High-Dimensional Data: A Survey
Karbauskaite, Rasa
Dzemyda, Gintautas
INFORMATICA, 2016, 27 (02) : 257 - 281
[35] Dimensionality Reduction and Subspace Clustering in Mixed Reality for Condition Monitoring of High-Dimensional Production Data
Hoppenstedt, Burkhard
Reichert, Manfred
Kammerer, Klaus
Probst, Thomas
Schlee, Winfried
Spiliopoulou, Myra
Pryss, Ruediger
SENSORS, 2019, 19 (18)
[36] DIMENSIONALITY REDUCTION OF HIGH-DIMENSIONAL DATA WITH A NONLINEAR PRINCIPAL COMPONENT ALIGNED GENERATIVE TOPOGRAPHIC MAPPING
Griebel, M.
Hullmann, A.
SIAM JOURNAL ON SCIENTIFIC COMPUTING, 2014, 36 (03) : A1027 - A1047
[37] A Group Feature Ranking and Selection Method Based on Dimension Reduction Technique in High-Dimensional Data
Zubair, Iqbal Muhammad
Kim, Byunghoon
IEEE ACCESS, 2022, 10 : 125136 - 125147
[38] The analysis on dimensionality reduction mathematical model based on feedback constraint for High-dimensional information
Peng, Wu
ADVANCES IN MECHATRONICS, AUTOMATION AND APPLIED INFORMATION TECHNOLOGIES, PTS 1 AND 2, 2014, 846-847 : 1056 - 1059
[39] Learning high-dimensional multimedia data
Xiaofeng Zhu
Zhi Jin
Rongrong Ji
Multimedia Systems, 2017, 23 : 281 - 283
[40] Visual cluster separation using high-dimensional sharpened dimensionality reduction
Kim, Youngjoo
Telea, Alexandru C.
Trager, Scott C.
Roerdink, Jos B. T. M.
INFORMATION VISUALIZATION, 2022, 21 (03) : 246 - 269

← 1 2 3 4 5 →