Gene expression data classification using topology and machine learning models

被引:2
|
作者
Dey, Tamal K. [1 ]
Mandal, Sayan [2 ]
Mukherjee, Soham [1 ]
机构
[1] Purdue Univ, Dept Comp Sci, W Lafayette, IN 47907 USA
[2] Ohio State Univ, Dept Comp Sci & Engn, Columbus, OH 43210 USA
基金
美国国家科学基金会;
关键词
Topological data analysis; Gene expression; Persistent cycles; Neural network;
D O I
10.1186/s12859-022-04704-z
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Background Interpretation of high-throughput gene expression data continues to require mathematical tools in data analysis that recognizes the shape of the data in high dimensions. Topological data analysis (TDA) has recently been successful in extracting robust features in several applications dealing with high dimensional constructs. In this work, we utilize some recent developments in TDA to curate gene expression data. Our work differs from the predecessors in two aspects: (1) Traditional TDA pipelines use topological signatures called barcodes to enhance feature vectors which are used for classification. In contrast, this work involves curating relevant features to obtain somewhat better representatives with the help of TDA. This representatives of the entire data facilitates better comprehension of the phenotype labels. (2) Most of the earlier works employ barcodes obtained using topological summaries as fingerprints for the data. Even though they are stable signatures, there exists no direct mapping between the data and said barcodes. Results The topology relevant curated data that we obtain provides an improvement in shallow learning as well as deep learning based supervised classifications. We further show that the representative cycles we compute have an unsupervised inclination towards phenotype labels. This work thus shows that topological signatures are able to comprehend gene expression levels and classify cohorts accordingly. Conclusions In this work, we engender representative persistent cycles to discern the gene expression data. These cycles allow us to directly procure genes entailed in similar processes.
引用
收藏
页数:21
相关论文
共 50 条
  • [21] Artificial intelligence and machine learning approaches using gene expression and variant data for personalized medicine
    Vadapalli, Sreya
    Abdelhalim, Habiba
    Zeeshan, Saman
    Ahmed, Zeeshan
    BRIEFINGS IN BIOINFORMATICS, 2022, 23 (05)
  • [22] Unsupervised Machine Learning Approach for Gene Expression Microarray Data Using Soft Computing Technique
    Rana, Madhurima
    Vijayeeta, Prachi
    Kar, Utsav
    Das, Madhabananda
    Mishra, B. S. P.
    PROCEEDINGS OF 3RD INTERNATIONAL CONFERENCE ON ADVANCED COMPUTING, NETWORKING AND INFORMATICS (ICACNI 2015), VOL 1, 2016, 43 : 497 - 506
  • [23] Latent Dirichlet Allocation for Classification using Gene Expression Data
    Yalamanchili, Hima Bindu
    Kho, Soon Jye
    Raymer, Michael L.
    2017 IEEE 17TH INTERNATIONAL CONFERENCE ON BIOINFORMATICS AND BIOENGINEERING (BIBE), 2017, : 39 - 44
  • [24] Machine Learning Framework for the Prediction of Alzheimer's Disease Using Gene Expression Data Based on Efficient Gene Selection
    El-Gawady, Aliaa
    Makhlouf, Mohamed A.
    Tawfik, BenBella S.
    Nassar, Hamed
    SYMMETRY-BASEL, 2022, 14 (03):
  • [25] Improvement of Lidar data classification algorithm using the machine learning technique
    Haider, Ali
    Tan, Songxin
    POLARIZATION SCIENCE AND REMOTE SENSING IX, 2019, 11132
  • [26] Integrating Deep Learning and SHAP for Breast Cancer Classification and Biomarker Discovery Using Gene Expression Data
    Aliouane, Salah Eddine
    Chehili, Hamza
    Boulahrouf, Khaled
    Abdelaziz, Aya
    Khlifa, Nawres
    Hamidechi, Mohamed Abdelhafid
    IEEE ACCESS, 2025, 13 : 49693 - 49709
  • [27] A Novel Machine-learning Model to Classify Schizophrenia Using Methylation Data Based on Gene Expression
    Vijayakumar, Karthikeyan A.
    Cho, Gwang-Won
    CURRENT BIOINFORMATICS, 2025, 20 (01) : 31 - 45
  • [28] Analyzing RNA-Seq Gene Expression Data Using Deep Learning Approaches for Cancer Classification
    Rukhsar, Laiqa
    Bangyal, Waqas Haider
    Ali Khan, Muhammad Sadiq
    Ag Ibrahim, Ag Asri
    Nisar, Kashif
    Rawat, Danda B.
    APPLIED SCIENCES-BASEL, 2022, 12 (04):
  • [29] GENE EXPRESSION DATA CLASSIFICATION AND PATTERN ANALYSIS USING DATA DRIVEN APPROACH
    Ramisa, Aiman Jabeen
    Hossain, Ananna
    Islam, S. K. Md Injamul
    Swadesh, Ponuel Mollah
    Islam, Md Toushif
    Rahman, Md Anisur
    Parvez, Mohammad Zavid
    PROCEEDINGS OF 2021 INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND CYBERNETICS (ICMLC), 2021, : 82 - 90
  • [30] Chatter Classification in Turning using Machine Learning and Topological Data Analysis
    Khasawneh, Firas A.
    Munch, Elizabeth
    Perea, Jose A.
    IFAC PAPERSONLINE, 2018, 51 (14): : 195 - 200