Proposing a Dimensionality Reduction Technique With an Inequality for Unsupervised Learning from High-Dimensional Big Data

被引:0
作者
Ismkhan, Hassan [1 ]
Izadi, Mohammad [1 ]
机构
[1] Sharif Univ Technol, Fac Comp Engn, Tehran 1458889694, Iran
来源
IEEE TRANSACTIONS ON SYSTEMS MAN CYBERNETICS-SYSTEMS | 2023年 / 53卷 / 06期
关键词
Clustering algorithms; Task analysis; Feature extraction; Unsupervised learning; Dimensionality reduction; Transforms; Standards; Big data; dimensionality reduction (DR); high-dimensional data; k-means; nearest neighbor (NN); K-MEANS;
D O I
10.1109/TSMC.2023.3234227
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
task can be considered as the most important unsupervised learning algorithms. For about all clustering algorithms, finding the Nearest Neighbors of a point within a certain radius r (NN -r), is a critical task. For a high-dimensional dataset, this task becomes too time consuming. This article proposes a simple dimensionality reduction (DR) technique. For point p in d-dimensional space, it produces point p' in d'-dimensional space, where d' << d. In addition, for any pair of points p and q, and their maps p' and q' in the target space, it is proved that |p, q| > |p', q'| is preserved, where |, | used to denote the Euclidean distance between a pair of points. This property can speed up finding NN -r. For a certain radius r, and a pair of points p and q, whenever |p', q'| > r, then q can not be in NN -r of p. Using this trick, the task of finding the NN -r is speeded up. Then, as a case study, it is applied to accelerate the k-means, one of the most famous unsupervised learning algorithms, where it can automatically determine the d'. The proposed NN -r method and the accelerated k-means are compared with recent state-of-the-arts, and both yield favorable results.
引用
收藏
页码:3880 / 3889
页数:10
相关论文
共 50 条
  • [31] On dimensionality reduction of high dimensional data sets
    Chizi, B
    Shmilovici, A
    Maimon, O
    INTELLIGENT TECHNOLOGIES - THEORY AND APPLICATIONS: NEW TRENDS IN INTELLIGENT TECHNOLOGIES, 2002, 76 : 233 - 238
  • [32] Manifold learning: Dimensionality reduction and high dimensional data reconstruction via dictionary learning
    Zhao, Zhong
    Feng, Guocan
    Zhu, Jiehua
    Shen, Qi
    NEUROCOMPUTING, 2016, 216 : 268 - 285
  • [33] Learning from label proportions on high-dimensional data
    Shi, Yong
    Liu, Jiabin
    Qi, Zhiquan
    Wang, Bo
    NEURAL NETWORKS, 2018, 103 : 9 - 18
  • [34] Fractal-Based Methods as a Technique for Estimating the Intrinsic Dimensionality of High-Dimensional Data: A Survey
    Karbauskaite, Rasa
    Dzemyda, Gintautas
    INFORMATICA, 2016, 27 (02) : 257 - 281
  • [35] Dimensionality Reduction and Subspace Clustering in Mixed Reality for Condition Monitoring of High-Dimensional Production Data
    Hoppenstedt, Burkhard
    Reichert, Manfred
    Kammerer, Klaus
    Probst, Thomas
    Schlee, Winfried
    Spiliopoulou, Myra
    Pryss, Ruediger
    SENSORS, 2019, 19 (18)
  • [36] DIMENSIONALITY REDUCTION OF HIGH-DIMENSIONAL DATA WITH A NONLINEAR PRINCIPAL COMPONENT ALIGNED GENERATIVE TOPOGRAPHIC MAPPING
    Griebel, M.
    Hullmann, A.
    SIAM JOURNAL ON SCIENTIFIC COMPUTING, 2014, 36 (03) : A1027 - A1047
  • [37] A Group Feature Ranking and Selection Method Based on Dimension Reduction Technique in High-Dimensional Data
    Zubair, Iqbal Muhammad
    Kim, Byunghoon
    IEEE ACCESS, 2022, 10 : 125136 - 125147
  • [38] The analysis on dimensionality reduction mathematical model based on feedback constraint for High-dimensional information
    Peng, Wu
    ADVANCES IN MECHATRONICS, AUTOMATION AND APPLIED INFORMATION TECHNOLOGIES, PTS 1 AND 2, 2014, 846-847 : 1056 - 1059
  • [39] Learning high-dimensional multimedia data
    Xiaofeng Zhu
    Zhi Jin
    Rongrong Ji
    Multimedia Systems, 2017, 23 : 281 - 283
  • [40] Visual cluster separation using high-dimensional sharpened dimensionality reduction
    Kim, Youngjoo
    Telea, Alexandru C.
    Trager, Scott C.
    Roerdink, Jos B. T. M.
    INFORMATION VISUALIZATION, 2022, 21 (03) : 246 - 269