A Comparison of Dimensionality Reduction Methods for Large Biological Data

被引:3
作者
Babjac, Ashley [1 ]
Royalty, Taylor [2 ]
Steen, Andrew D. [3 ]
Emrich, Scott J. [1 ]
机构
[1] Univ Tennessee, Dept Elect Engn & Comp Sci, Knoxville, TN 37996 USA
[2] Univ Tennessee, Dept Earth & Planetary Sci, Knoxville, TN USA
[3] Univ Tennessee, Dept Microbiol, Knoxville, TN 37996 USA
来源
13TH ACM INTERNATIONAL CONFERENCE ON BIOINFORMATICS, COMPUTATIONAL BIOLOGY AND HEALTH INFORMATICS, BCB 2022 | 2022年
关键词
autoencoders; dimensionality reduction; classification;
D O I
10.1145/3535508.3545536
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Large-scale data often suffer from the curse of dimensionality and the constraints associated with it; therefore, dimensionality reduction methods are often performed prior to most machine learning pipelines. In this paper, we directly compare autoencoders performance as a dimensionality reduction technique (via the latent space) to other established methods: PCA, LASSO, and t-SNE. To do so, we use four distinct datasets that vary in the types of features, metadata, labels, and size to robustly compare different methods. We test prediction capability using both Support Vector Machines (SVM) and Random Forests (RF). Significantly, we conclude that autoencoders are an equivalent dimensionality reduction architecture to the previously established methods, and often outperform them in both prediction accuracy and time performance when condensing large, sparse datasets.
引用
收藏
页数:7
相关论文
共 30 条
[1]   Principal component analysis [J].
Abdi, Herve ;
Williams, Lynne J. .
WILEY INTERDISCIPLINARY REVIEWS-COMPUTATIONAL STATISTICS, 2010, 2 (04) :433-459
[2]  
Baldi Pierre, 2011, P ICML WORKSHOP UNSU, P37
[3]  
Cayton L., 2005, Algorithms for manifold learning, V12, P1
[4]   SMOTE: Synthetic minority over-sampling technique [J].
Chawla, Nitesh V. ;
Bowyer, Kevin W. ;
Hall, Lawrence O. ;
Kegelmeyer, W. Philip .
2002, American Association for Artificial Intelligence (16)
[5]  
Chollet F., 2015, Keras
[6]   Application of Deep Learning in Plant-Microbiota Association Analysis [J].
Deng, Zhiyu ;
Zhang, Jinming ;
Li, Junya ;
Zhang, Xiujun .
FRONTIERS IN GENETICS, 2021, 12
[7]   Empirical comparison of cross-validation and internal metrics for tuning SVM hyperparameters [J].
Duarte, Edson ;
Wainer, Jacques .
PATTERN RECOGNITION LETTERS, 2017, 88 :6-11
[8]  
Fonti V., 2017, VU Amsterdam Research Paper in Business Analytics, V30, P1
[9]   Predicting microbiomes through a deep latent space [J].
Garcia-Jimenez, Beatriz ;
Munoz, Jorge ;
Cabello, Sara ;
Medina, Joaquin ;
Wilkinson, Mark D. .
BIOINFORMATICS, 2021, 37 (10) :1444-1451
[10]   Artificial Seawater Media Facilitate Cultivating Members of the Microbial Majority from the Gulf of Mexico [J].
Henson, Michael W. ;
Pitre, David M. ;
Weckhorst, Jessica Lee ;
Lanclos, V. Celeste ;
Webber, Austen T. ;
Thrash, J. Cameron .
MSPHERE, 2016, 1 (02)