Dimensionality Reduction of RNA-Seq Data

被引:0
作者
Al-Turaiki, Isra [1 ]
机构
[1] King Saud Univ, Coll Comp & Informat Sci, Informat Technol Dept, Riyadh, Saudi Arabia
来源
INTERNATIONAL JOURNAL OF COMPUTER SCIENCE AND NETWORK SECURITY | 2021年 / 21卷 / 03期
关键词
Principal Component Analysis (PCA); Singular Value Decomposition (SVD); Self-Organizing Maps (SOM); RNA-Seq; Dimensionality Reduction;
D O I
10.22937/IJCSNS.2021.21.3.4
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
RNA sequencing (RNA-Seq) is a technology that facilitates transcriptome analysis using next-generation sequencing (NSG) tools. Information on the quantity and sequences of RNA is vital to relate our genomes to functional protein expression. RNA-Seq data are characterized as being high-dimensional in that the number of variables (i.e., transcripts) far exceeds the number of observations (e.g., experiments). Given the wide range of dimensionality reduction techniques, it is not clear which is best for RNA-Seq data analysis. In this paper, we study the effect of three dimensionality reduction techniques to improve the classification of the RNA-Seq dataset. In particular, we use PCA, SVD, and SOM to obtain a reduced feature space. We built nine classification models for a cancer dataset and compared their performance. Our experimental results indicate that better classification performance is obtained with PCA and SOM. Overall, the combinations PCA+KNN, SOM+RF, and SOM+KNN produce preferred results.
引用
收藏
页码:31 / 36
页数:6
相关论文
共 20 条
[1]  
Ahvenlampi T, 2007, FAULT DETECT SUPERVI, V2, P849, DOI [10.1016/B978-008044485-7/50143-3, DOI 10.1016/B978-008044485-7/50143-3]
[2]   Feature Extraction Methods in Quantitative StructureActivity Relationship Modeling: A Comparative Study [J].
Alsenan, Shrooq A. ;
Al-Turaiki, Isra M. ;
Hafez, Alaaeldin M. .
IEEE ACCESS, 2020, 8 :78737-78752
[3]   AN INTRODUCTION TO KERNEL AND NEAREST-NEIGHBOR NONPARAMETRIC REGRESSION [J].
ALTMAN, NS .
AMERICAN STATISTICIAN, 1992, 46 (03) :175-185
[4]   A Hybrid Heuristic Dimensionality Reduction Methods for Classifying Malaria Vector Gene Expression Data [J].
Arowolo, Micheal O. ;
Adebiyi, Marion Olubunmi ;
Adebiyi, Ayodele Ariyo ;
Okesola, Olatunji Julius .
IEEE ACCESS, 2020, 8 :182422-182430
[5]  
Binder H., 2011, NAT PRECED, P1, DOI [10.1038/npre.2011.5825.2, DOI 10.1038/NPRE.2011.5825.2]
[6]   Random forests [J].
Breiman, L .
MACHINE LEARNING, 2001, 45 (01) :5-32
[7]   Denoising Autoencoder Self-Organizing Map (DASOM) [J].
Ferles, Christos ;
Papanikolaou, Yannis ;
Naidoo, Kevin J. .
NEURAL NETWORKS, 2018, 105 :112-131
[8]   SINGULAR VALUE DECOMPOSITION AND LEAST SQUARES SOLUTIONS [J].
GOLUB, GH ;
REINSCH, C .
NUMERISCHE MATHEMATIK, 1970, 14 (05) :403-&
[9]  
Han J., 2000, The Morgan Kaufmann series in data management systems series
[10]  
Jabeen A, 2018, L N COMPUT VIS BIOME, V26, P133, DOI 10.1007/978-3-319-65981-7_6