A Comparison of Dimensionality Reduction Methods for Large Biological Data

被引:0
|
作者
Babjac, Ashley [1 ]
Royalty, Taylor [2 ]
Steen, Andrew D. [3 ]
Emrich, Scott J. [1 ]
机构
[1] Univ Tennessee, Dept Elect Engn & Comp Sci, Knoxville, TN 37996 USA
[2] Univ Tennessee, Dept Earth & Planetary Sci, Knoxville, TN USA
[3] Univ Tennessee, Dept Microbiol, Knoxville, TN 37996 USA
关键词
autoencoders; dimensionality reduction; classification;
D O I
10.1145/3535508.3545536
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Large-scale data often suffer from the curse of dimensionality and the constraints associated with it; therefore, dimensionality reduction methods are often performed prior to most machine learning pipelines. In this paper, we directly compare autoencoders performance as a dimensionality reduction technique (via the latent space) to other established methods: PCA, LASSO, and t-SNE. To do so, we use four distinct datasets that vary in the types of features, metadata, labels, and size to robustly compare different methods. We test prediction capability using both Support Vector Machines (SVM) and Random Forests (RF). Significantly, we conclude that autoencoders are an equivalent dimensionality reduction architecture to the previously established methods, and often outperform them in both prediction accuracy and time performance when condensing large, sparse datasets.
引用
收藏
页数:7
相关论文
共 50 条
  • [1] Applications and Comparison of Dimensionality Reduction Methods for Microbiome Data
    Armstrong, George
    Rahman, Gibraan
    Martino, Cameron
    McDonald, Daniel
    Gonzalez, Antonio
    Mishne, Gal
    Knight, Rob
    FRONTIERS IN BIOINFORMATICS, 2022, 2
  • [2] COMPARISON OF DIMENSIONALITY REDUCTION METHODS APPLIED TO ORDINAL DATA
    Prokop, Martin
    Rezankova, Hana
    7TH INTERNATIONAL DAYS OF STATISTICS AND ECONOMICS, 2013, : 1150 - 1159
  • [3] Dimensionality Reduction in Boolean Data: Comparison of Four BMF Methods
    Bartl, Eduard
    Belohlavek, Radim
    Osicka, Petr
    Rezankova, Hana
    CLUSTERING HIGH-DIMENSIONAL DATA, CHDD 2012, 2015, 7627 : 118 - 133
  • [4] DATA DIMENSIONALITY REDUCTION METHODS FOR ORDINAL DATA
    Prokop, Martin
    Rezankova, Hana
    INTERNATIONAL DAYS OF STATISTICS AND ECONOMICS, 2011, : 523 - 533
  • [5] Comparison of Matrix Dimensionality Reduction Methods in Uncovering Latent Structures in the Data
    Kumar, Ch.
    Palanisamy, Ramaraj
    JOURNAL OF INFORMATION & KNOWLEDGE MANAGEMENT, 2010, 9 (01) : 81 - 92
  • [6] Comparison Of Linear Dimensionality Reduction Methods On Classification Methods
    Yildiz, Eray
    Sevim, Yusuf
    2016 NATIONAL CONFERENCE ON ELECTRICAL, ELECTRONICS AND BIOMEDICAL ENGINEERING (ELECO), 2016, : 161 - 164
  • [7] Comparison of feature extraction methods in dimensionality reduction
    Wu, Jee-cheng
    Chang, Chiao-Po
    Tsuei, Gwo-Chyang
    CANADIAN JOURNAL OF REMOTE SENSING, 2010, 36 (06): : 645 - 649
  • [8] Dimensionality Reduction Methods: The Comparison of Speed and Accuracy
    Zubova, Jelena
    Kurasova, Olga
    Liutvinavicius, Marius
    INFORMATION TECHNOLOGY AND CONTROL, 2018, 47 (01): : 151 - 160
  • [9] Quality Assessment of Large Scale Dimensionality Reduction Methods
    Banda, Ntombikayise
    Engelbrecht, Andries
    2017 IEEE 4TH INTERNATIONAL CONFERENCE ON SOFT COMPUTING & MACHINE INTELLIGENCE (ISCMI), 2017, : 6 - 10
  • [10] A Comparison for Dimensionality Reduction Methods of Single-Cell RNA-seq Data
    Xiang, Ruizhi
    Wang, Wencan
    Yang, Lei
    Wang, Shiyuan
    Xu, Chaohan
    Chen, Xiaowen
    FRONTIERS IN GENETICS, 2021, 12