Feature Selection in Single-Cell RNA-seq Data via a Genetic Algorithm

被引:3
作者
Chatzilygeroudis, Konstantinos I. [1 ,2 ]
Vrahatis, Aristidis G. [3 ]
Tasoulis, Sotiris K. [3 ]
Vrahatis, Michael N. [2 ]
机构
[1] Univ Patras, CEID, Patras, Greece
[2] Univ Patras, Dept Math, Computat Intelligence Lab, Patras, Greece
[3] Univ Thessaly, Dept Comp Sci & Biomed Informat, Volos, Greece
来源
LEARNING AND INTELLIGENT OPTIMIZATION, LION 15 | 2021年 / 12931卷
关键词
Feature selection; Optimization; Single-cell RNA-seq; High-dimensional data; EXPRESSION DATA; CLASSIFICATION; KERNEL;
D O I
10.1007/978-3-030-92121-7_6
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Big data methods prevail in the biomedical domain leading to effective and scalable data-driven approaches. Biomedical data are known for their ultra-high dimensionality, especially the ones coming from molecular biology experiments. This property is also included in the emerging technique of single-cell RNA-sequencing (scRNA-seq), where we obtain sequence information from individual cells. A reliable way to uncover their complexity is by using Machine Learning approaches, including dimensional reduction and feature selection methods. Although the first choice has had remarkable progress in scRNA-seq data, only the latter can offer deeper interpretability at the gene level since it highlights the dominant gene features in the given data. Towards tackling this challenge, we propose a feature selection framework that utilizes genetic optimization principles and identifies low-dimensional combinations of gene lists in order to enhance classification performance of any off-the-shelf classifier (e.g., LDA or SVM). Our intuition is that by identifying an optimal genes subset, we can enhance the prediction power of scRNA-seq data even if these genes are unrelated to each other. We showcase our proposed framework's effectiveness in two real scRNA-seq experiments with gene dimensions up to 36708. Our framework can identify very low-dimensional subsets of genes (less than 200) while boosting the classifiers' performance. Finally, we provide a biological interpretation of the selected genes, thus providing evidence of our method's utility towards explainable artificial intelligence.
引用
收藏
页码:66 / 79
页数:14
相关论文
共 35 条
[1]   Gene selection in cancer classification using PSO/SVM and GA/SVM hybrid algorithms [J].
Alba, Enrique ;
Garcia-Nieto, Jose ;
Jourdan, Laetitia ;
Talbi, El-Ghazali .
2007 IEEE CONGRESS ON EVOLUTIONARY COMPUTATION, VOLS 1-10, PROCEEDINGS, 2007, :284-+
[2]   AN INTRODUCTION TO KERNEL AND NEAREST-NEIGHBOR NONPARAMETRIC REGRESSION [J].
ALTMAN, NS .
AMERICAN STATISTICIAN, 1992, 46 (03) :175-185
[3]   M3Drop: dropout-based feature selection for scRNASeq [J].
Andrews, Tallulah S. ;
Hemberg, Martin .
BIOINFORMATICS, 2019, 35 (16) :2865-2867
[4]   ArrayExpress update - from bulk to single-cell expression data [J].
Athar, Awais ;
Fullgrabe, Anja ;
George, Nancy ;
Iqbal, Haider ;
Huerta, Laura ;
Ali, Ahmed ;
Snow, Catherine ;
Fonseca, Nuno A. ;
Petryszak, Robert ;
Papatheodorou, Irene ;
Sarkans, Ugis ;
Brazma, Alvis .
NUCLEIC ACIDS RESEARCH, 2019, 47 (D1) :D711-D715
[5]   Dimensionality reduction for visualizing single-cell data using UMAP [J].
Becht, Etienne ;
McInnes, Leland ;
Healy, John ;
Dutertre, Charles-Antoine ;
Kwok, Immanuel W. H. ;
Ng, Lai Guan ;
Ginhoux, Florent ;
Newell, Evan W. .
NATURE BIOTECHNOLOGY, 2019, 37 (01) :38-+
[6]   Knowledge-based analysis of microarray gene expression data by using support vector machines [J].
Brown, MPS ;
Grundy, WN ;
Lin, D ;
Cristianini, N ;
Sugnet, CW ;
Furey, TS ;
Ares, M ;
Haussler, D .
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 2000, 97 (01) :262-267
[7]   Computational analysis of cell-to-cell heterogeneity in single-cell RNA-sequencing data reveals hidden subpopulations of cells [J].
Buettner, Florian ;
Natarajan, Kedar N. ;
Casale, F. Paolo ;
Proserpio, Valentina ;
Scialdone, Antonio ;
Theis, Fabian J. ;
Teichmann, Sarah A. ;
Marioni, John C. ;
Stegie, Oliver .
NATURE BIOTECHNOLOGY, 2015, 33 (02) :155-160
[8]   Gene-gene interaction: the curse of dimensionality [J].
Chattopadhyay, Amrita ;
Lu, Tzu-Pin .
ANNALS OF TRANSLATIONAL MEDICINE, 2019, 7 (24)
[9]  
Chatzilygeroudis K, 2021, INTELLIGENT COMPUTIN, P143, DOI [10.1145/3447404.3447414, DOI 10.1145/3447404.3447414]
[10]  
Clough E, 2016, METHODS MOL BIOL, V1418, P93, DOI 10.1007/978-1-4939-3578-9_5