Bi-dimensional principal gene feature selection from big gene expression data

被引:4
|
作者
Hou, Xiaoqian [1 ]
Hou, Jingyu [1 ]
Huang, Guangyan [1 ]
机构
[1] Deakin Univ, Sch Informat Technol, Melbourne, Vic, Australia
来源
PLOS ONE | 2022年 / 17卷 / 12期
关键词
D O I
10.1371/journal.pone.0278583
中图分类号
O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];
学科分类号
07 ; 0710 ; 09 ;
摘要
Gene expression sample data, which usually contains massive expression profiles of genes, is commonly used for disease related gene analysis. The selection of relevant genes from huge amount of genes is always a fundamental process in applications of gene expression data. As more and more genes have been detected, the size of gene expression data becomes larger and larger; this challenges the computing efficiency for extracting the relevant and important genes from gene expression data. In this paper, we provide a novel Bi-dimensional Principal Feature Selection (BPFS) method for efficiently extracting critical genes from big gene expression data. It applies the principal component analysis (PCA) method on sample and gene domains successively, aiming at extracting the relevant gene features and reducing redundancies while losing less information. The experimental results on four real-world cancer gene expression datasets show that the proposed BPFS method greatly reduces the data size and achieves a nearly double processing speed compared to the counterpart methods, while maintaining better accuracy and effectiveness.
引用
收藏
页数:15
相关论文
共 50 条
  • [1] Feature selection and gene clustering from gene expression data
    Mitra, P
    Majumder, DD
    PROCEEDINGS OF THE 17TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION, VOL 2, 2004, : 343 - 346
  • [2] Optimal Bayesian Feature Selection on High Dimensional Gene Expression Data
    Pour, Ali Foroughi
    Dalton, Lori A.
    2014 IEEE GLOBAL CONFERENCE ON SIGNAL AND INFORMATION PROCESSING (GLOBALSIP), 2014, : 1402 - 1405
  • [3] Gene ontology driven feature selection from microarray gene expression data
    Qi, Jianlong
    Tang, Jian
    PROCEEDINGS OF THE 2006 IEEE SYMPOSIUM ON COMPUTATIONAL INTELLIGENCE IN BIOINFORMATICS AND COMPUTATIONAL BIOLOGY, 2006, : 428 - +
  • [4] A bi-dimensional regression tree approach to the modeling of gene expression regulation
    Ruan, JH
    Zhang, WX
    BIOINFORMATICS, 2006, 22 (03) : 332 - 340
  • [5] Improving the performance of principal components for classification of gene expression data through feature selection
    Acuna, Edgar
    Porras, Jaime
    DATA SCIENCE AND CLASSIFICATION, 2006, : 325 - +
  • [6] Feature Selection in High-Dimensional Space with Applications to Gene Expression Data
    Pantha, Nishan
    Ramasubramanian, Muthukumaran
    Gurung, Iksha
    Maskey, Manil
    Sanders, Lauren M.
    Casaletto, James
    Costes, Sylvain V.
    SOUTHEASTCON 2024, 2024, : 6 - 15
  • [7] Feature Selection in Gene Expression Data Using Principal Component Analysis and Rough Set Theory
    Mishra, Debahuti
    Dash, Rajashree
    Rath, Amiya Kumar
    Acharya, Milu
    SOFTWARE TOOLS AND ALGORITHMS FOR BIOLOGICAL SYSTEMS, 2011, 696 : 91 - 100
  • [8] Minimum redundancy feature selection from microarray gene expression data
    Ding, C
    Peng, HC
    PROCEEDINGS OF THE 2003 IEEE BIOINFORMATICS CONFERENCE, 2003, : 523 - 528
  • [9] Data mining for feature selection in gene expression autism data
    Latkowski, Tomasz
    Osowski, Stanislaw
    EXPERT SYSTEMS WITH APPLICATIONS, 2015, 42 (02) : 864 - 872
  • [10] Informative Feature Clustering and Selection for Gene Expression Data
    Yang, Yuqi
    Yin, Pengshuai
    Luo, Zhihang
    Gu, Wenwen
    Chen, Renjie
    Wu, Qingyao
    IEEE ACCESS, 2019, 7 : 169174 - 169184