Gene expression data classification using topology and machine learning models

被引:2
|
作者
Dey, Tamal K. [1 ]
Mandal, Sayan [2 ]
Mukherjee, Soham [1 ]
机构
[1] Purdue Univ, Dept Comp Sci, W Lafayette, IN 47907 USA
[2] Ohio State Univ, Dept Comp Sci & Engn, Columbus, OH 43210 USA
基金
美国国家科学基金会;
关键词
Topological data analysis; Gene expression; Persistent cycles; Neural network;
D O I
10.1186/s12859-022-04704-z
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Background Interpretation of high-throughput gene expression data continues to require mathematical tools in data analysis that recognizes the shape of the data in high dimensions. Topological data analysis (TDA) has recently been successful in extracting robust features in several applications dealing with high dimensional constructs. In this work, we utilize some recent developments in TDA to curate gene expression data. Our work differs from the predecessors in two aspects: (1) Traditional TDA pipelines use topological signatures called barcodes to enhance feature vectors which are used for classification. In contrast, this work involves curating relevant features to obtain somewhat better representatives with the help of TDA. This representatives of the entire data facilitates better comprehension of the phenotype labels. (2) Most of the earlier works employ barcodes obtained using topological summaries as fingerprints for the data. Even though they are stable signatures, there exists no direct mapping between the data and said barcodes. Results The topology relevant curated data that we obtain provides an improvement in shallow learning as well as deep learning based supervised classifications. We further show that the representative cycles we compute have an unsupervised inclination towards phenotype labels. This work thus shows that topological signatures are able to comprehend gene expression levels and classify cohorts accordingly. Conclusions In this work, we engender representative persistent cycles to discern the gene expression data. These cycles allow us to directly procure genes entailed in similar processes.
引用
收藏
页数:21
相关论文
共 50 条
  • [31] A data structure and function classification based method to evaluate clustering models for gene expression data
    易东
    杨梦苏
    黄明辉
    李辉智
    王文昌
    Journal of Medical Colleges of PLA, 2002, (04) : 312 - 317
  • [32] Risk classification of cancer survival using ANN with gene expression data from multiple laboratories
    Chen, Yen-Chen
    Ke, Wan-Chi
    Chiu, Hung-Wen
    COMPUTERS IN BIOLOGY AND MEDICINE, 2014, 48 : 1 - 7
  • [33] A Machine Learning Method to Trace Cancer Primary Lesion Using Microarray-Based Gene Expression Data
    Lu, Qingfeng
    Chen, Fengxia
    Li, Qianyue
    Chen, Lihong
    Tong, Ling
    Tian, Geng
    Zhou, Xiaohong
    FRONTIERS IN ONCOLOGY, 2022, 12
  • [34] A survey of methods for classification of gene expression data using evolutionary algorithms
    Wahde, M
    Szallasi, Z
    EXPERT REVIEW OF MOLECULAR DIAGNOSTICS, 2006, 6 (01) : 101 - 110
  • [35] Improved prediction of gene expression through integrating cell signalling models with machine learning
    Al Taweraqi, Nada
    King, Ross D.
    BMC BIOINFORMATICS, 2022, 23 (01)
  • [36] Wavelet extreme learning machine and deep learning for data classification
    Yahia, Siwar
    Said, Salwa
    Zaied, Mourad
    NEUROCOMPUTING, 2022, 470 : 280 - 289
  • [37] Improved prediction of gene expression through integrating cell signalling models with machine learning
    Nada Al taweraqi
    Ross D. King
    BMC Bioinformatics, 23
  • [38] Machine learning for analysis of gene expression data in fast- and slow-progressing amyotrophic lateral sclerosis murine models
    Iadanza, Ernesto
    Fabbri, Rachele
    Goretti, Francesco
    Nardo, Giovanni
    Niccolai, Elena
    Bendotti, Caterina
    Amedei, Amedeo
    BIOCYBERNETICS AND BIOMEDICAL ENGINEERING, 2022, 42 (01) : 273 - 284
  • [39] Why classification models using array gene expression data perform so well: A preliminary investigation of explanatory factors
    Aliferis, CF
    Tsamardinos, I
    Massion, P
    Statnikov, AR
    Hardin, D
    METMBS'03: PROCEEDINGS OF THE INTERNATIONAL CONFERENCE ON MATHEMATICS AND ENGINEERING TECHNIQUES IN MEDICINE AND BIOLOGICAL SCIENCES, 2003, : 47 - 53
  • [40] Translation of Gene Expression Data Into Personalized Treatment in Cervical Cancer: Machine Learning Approach
    Sudha, Balraj
    Krishnaveni, M.
    Sumathi, Sundaravadivelu
    INDIAN JOURNAL OF GYNECOLOGIC ONCOLOGY, 2024, 22 (02)