Application of Biological Domain Knowledge Based Feature Selection on Gene Expression Data

被引:43
作者
Yousef, Malik [1 ,2 ]
Kumar, Abhishek [3 ,4 ]
Bakir-Gungor, Burcu [5 ]
机构
[1] Zefat Acad Coll, Dept Informat Syst, IL-13206 Safed, Israel
[2] Zefat Acad Coll, Galilee Digital Hlth Res Ctr GDH, IL-13206 Safed, Israel
[3] Inst Bioinformat, Int Technol Pk, Bangalore 560066, Karnataka, India
[4] Manipal Acad Higher Educ MAHE, Manipal 576104, India
[5] Abdullah Gul Univ, Dept Comp Engn, Fac Engn, TR-38080 Kayseri, Turkey
关键词
feature selection; feature ranking; grouping; clustering; biological knowledge;
D O I
10.3390/e23010002
中图分类号
O4 [物理学];
学科分类号
0702 ;
摘要
In the last two decades, there have been massive advancements in high throughput technologies, which resulted in the exponential growth of public repositories of gene expression datasets for various phenotypes. It is possible to unravel biomarkers by comparing the gene expression levels under different conditions, such as disease vs. control, treated vs. not treated, drug A vs. drug B, etc. This problem refers to a well-studied problem in the machine learning domain, i.e., the feature selection problem. In biological data analysis, most of the computational feature selection methodologies were taken from other fields, without considering the nature of the biological data. Thus, integrative approaches that utilize the biological knowledge while performing feature selection are necessary for this kind of data. The main idea behind the integrative gene selection process is to generate a ranked list of genes considering both the statistical metrics that are applied to the gene expression data, and the biological background information which is provided as external datasets. One of the main goals of this review is to explore the existing methods that integrate different types of information in order to improve the identification of the biomolecular signatures of diseases and the discovery of new potential targets for treatment. These integrative approaches are expected to aid the prediction, diagnosis, and treatment of diseases, as well as to enlighten us on disease state dynamics, mechanisms of their onset and progression. The integration of various types of biological information will necessitate the development of novel techniques for integration and data analysis. Another aim of this review is to boost the bioinformatics community to develop new approaches for searching and determining significant groups/clusters of features based on one or more biological grouping functions.
引用
收藏
页码:1 / 15
页数:15
相关论文
共 50 条
  • [31] Feature Selection and Classification for Gene Expression Data using Evolutionary Computation
    Banka, Haider
    Dara, Suresh
    2012 23RD INTERNATIONAL WORKSHOP ON DATABASE AND EXPERT SYSTEMS APPLICATIONS (DEXA), 2012, : 185 - 189
  • [32] An efficient statistical feature selection approach for classification of gene expression data
    Chandra, B.
    Gupta, Manish
    JOURNAL OF BIOMEDICAL INFORMATICS, 2011, 44 (04) : 529 - 535
  • [33] Unsupervised feature selection for biomarker identification in chromatography and gene expression data
    Strickert, Marc
    Sreenivasulu, Nese
    Peterek, Silke
    Weschke, Winfriede
    Mock, Hans-Peter
    Seiffert, Udo
    ARTIFICIAL NEURAL NETWORKS IN PATTERN RECOGNITION, PROCEEDINGS, 2006, 4087 : 274 - 285
  • [34] A Hybrid Filter/Wrapper Approach of Feature Selection for Gene Expression Data
    Ke, Chao-Hsuan
    Yang, Cheng-Hong
    Chuang, Li-Yeh
    Yang, Cheng-San
    2008 IEEE INTERNATIONAL CONFERENCE ON SYSTEMS, MAN AND CYBERNETICS (SMC), VOLS 1-6, 2008, : 2663 - +
  • [35] Optimal Bayesian Feature Selection on High Dimensional Gene Expression Data
    Pour, Ali Foroughi
    Dalton, Lori A.
    2014 IEEE GLOBAL CONFERENCE ON SIGNAL AND INFORMATION PROCESSING (GLOBALSIP), 2014, : 1402 - 1405
  • [36] Feature Selection Using Information Distance Measure for Gene Expression Data
    Cai, Jie
    Liang, Cheng
    Luo, Jiawei
    CURRENT PROTEOMICS, 2018, 15 (05) : 352 - 362
  • [37] Improved binary PSO for feature selection using gene expression data
    Chuang, Li-Yeh
    Chang, Hsueh-Wei
    Tu, Chung-Jui
    Yang, Cheng-Hong
    COMPUTATIONAL BIOLOGY AND CHEMISTRY, 2008, 32 (01) : 29 - 38
  • [38] A new distributed feature selection technique for classifying gene expression data
    Ayyad, Sarah M.
    Saleh, Ahmed, I
    Labib, Labib M.
    INTERNATIONAL JOURNAL OF BIOMATHEMATICS, 2019, 12 (04)
  • [39] Implanting Domain Knowledge into Feature Selection for Effective Outlier Detection in Network Traffic Data
    Wang, Zhongyang
    Wang, Yijie
    Wang, Yongjun
    2021 IEEE SMARTWORLD, UBIQUITOUS INTELLIGENCE & COMPUTING, ADVANCED & TRUSTED COMPUTING, SCALABLE COMPUTING & COMMUNICATIONS, INTERNET OF PEOPLE, AND SMART CITY INNOVATIONS (SMARTWORLD/SCALCOM/UIC/ATC/IOP/SCI 2021), 2021, : 115 - 122
  • [40] Feature selection of gene expression data for Cancer classification using double RBF-kernels
    Shenghui Liu
    Chunrui Xu
    Yusen Zhang
    Jiaguo Liu
    Bin Yu
    Xiaoping Liu
    Matthias Dehmer
    BMC Bioinformatics, 19