Identifying redundant features using unsupervised learning for high-dimensional data

被引:0
|
作者
Asir Antony Gnana Singh Danasingh
Appavu alias Balamurugan Subramanian
Jebamalar Leavline Epiphany
机构
[1] Anna University,Department of Computer Science and Engineering
[2] K.L.N College of Information Technology,Department of Information Technology
[3] Anna University,Department of Electronics and Communication Engineering
来源
SN Applied Sciences | 2020年 / 2卷
关键词
Clustering; Redundancy rate; K-means clustering; EM clustering; Hierarchical clustering;
D O I
暂无
中图分类号
学科分类号
摘要
In the digital era, classifiers play a vital role in various machine learning applications such as medical diagnosis, weather prediction and pattern recognition. The classifiers are built by classification algorithms using data. Nowadays, the data are high dimensional in nature since the data are massively generated due to advancements in information and communication technology. The high-dimensional space contains irrelevant and redundant features; both of them reduce the classification accuracy and increase space and building time of the classifiers. Redundancy and relevancy analysis mechanisms of the feature selection process remove the irrelevant and redundant features. Identifying the irrelevant features is a simple task since that only considers the relevancy between each feature and the target class of a dataset using any one of the statistical or information theoretic measures. Identifying the redundant features from a dataset is quite difficult, especially in high-dimensional space since it needs to consider the relevancy among the features. This leads to more computational complexity and an inappropriate relevancy measure that can degrade the classification accuracy. In order to overcome these problems, this paper presents an unsupervised learning-based redundancy analysis mechanism for feature selection by evaluating various clustering techniques in terms of average redundancy rate and runtime.
引用
收藏
相关论文
共 50 条
  • [1] Identifying redundant features using unsupervised learning for high-dimensional data
    Danasingh, Asir Antony Gnana Singh
    Subramanian, Appavu alias Balamurugan
    Epiphany, Jebamalar Leavline
    SN APPLIED SCIENCES, 2020, 2 (08):
  • [2] Flexible High-Dimensional Unsupervised Learning with Missing Data
    Wei, Yuhong
    Tang, Yang
    McNicholas, Paul D.
    IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2020, 42 (03) : 610 - 621
  • [3] Unsupervised universal steganalyzer for high-dimensional steganalytic features
    Hou, Xiaodan
    Zhang, Tao
    JOURNAL OF ELECTRONIC IMAGING, 2016, 25 (06)
  • [4] Learning high-dimensional data
    Verleysen, M
    LIMITATIONS AND FUTURE TRENDS IN NEURAL COMPUTATION, 2003, 186 : 141 - 162
  • [5] Deep-learning approach to identifying cancer subtypes using high-dimensional genomic data
    Chen, Runpu
    Yang, Le
    Goodison, Steve
    Sun, Yijun
    BIOINFORMATICS, 2020, 36 (05) : 1476 - 1483
  • [6] A General Framework for High-Dimensional Data Reduction Using Unsupervised Bayesian Model
    Jin, Longcun
    Wan, Wanggen
    Wu, Yongliang
    Cui, Bin
    Yu, Xiaoqing
    LIFE SYSTEM MODELING AND INTELLIGENT COMPUTING, PT II, 2010, 98 : 96 - 101
  • [7] Unsupervised representation learning on high-dimensional clinical data improves genomic discovery and prediction
    Yun, Taedong
    Cosentino, Justin
    Behsaz, Babak
    McCaw, Zachary R.
    Hill, Davin
    Luben, Robert
    Lai, Dongbing
    Bates, John
    Yang, Howard
    Schwantes-An, Tae-Hwi
    Zhou, Yuchen
    Khawaja, Anthony P.
    Carroll, Andrew
    Hobbs, Brian D.
    Cho, Michael H.
    McLean, Cory Y.
    Hormozdiari, Farhad
    NATURE GENETICS, 2024, 56 (08) : 1604 - 1613
  • [8] Learning high-dimensional multimedia data
    Xiaofeng Zhu
    Zhi Jin
    Rongrong Ji
    Multimedia Systems, 2017, 23 : 281 - 283
  • [9] Learning to visualise high-dimensional data
    Ahmad, K
    Vrusias, B
    EIGHTH INTERNATIONAL CONFERENCE ON INFORMATION VISUALISATION, PROCEEDINGS, 2004, : 507 - 512
  • [10] Active Learning for High-Dimensional Binary Features
    Vahdat, Ali
    Belbahri, Mouloud
    Nia, Vahid Partovi
    2019 15TH INTERNATIONAL CONFERENCE ON NETWORK AND SERVICE MANAGEMENT (CNSM), 2019,