Gene expression data classification using topology and machine learning models

被引:2
|
作者
Dey, Tamal K. [1 ]
Mandal, Sayan [2 ]
Mukherjee, Soham [1 ]
机构
[1] Purdue Univ, Dept Comp Sci, W Lafayette, IN 47907 USA
[2] Ohio State Univ, Dept Comp Sci & Engn, Columbus, OH 43210 USA
基金
美国国家科学基金会;
关键词
Topological data analysis; Gene expression; Persistent cycles; Neural network;
D O I
10.1186/s12859-022-04704-z
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Background Interpretation of high-throughput gene expression data continues to require mathematical tools in data analysis that recognizes the shape of the data in high dimensions. Topological data analysis (TDA) has recently been successful in extracting robust features in several applications dealing with high dimensional constructs. In this work, we utilize some recent developments in TDA to curate gene expression data. Our work differs from the predecessors in two aspects: (1) Traditional TDA pipelines use topological signatures called barcodes to enhance feature vectors which are used for classification. In contrast, this work involves curating relevant features to obtain somewhat better representatives with the help of TDA. This representatives of the entire data facilitates better comprehension of the phenotype labels. (2) Most of the earlier works employ barcodes obtained using topological summaries as fingerprints for the data. Even though they are stable signatures, there exists no direct mapping between the data and said barcodes. Results The topology relevant curated data that we obtain provides an improvement in shallow learning as well as deep learning based supervised classifications. We further show that the representative cycles we compute have an unsupervised inclination towards phenotype labels. This work thus shows that topological signatures are able to comprehend gene expression levels and classify cohorts accordingly. Conclusions In this work, we engender representative persistent cycles to discern the gene expression data. These cycles allow us to directly procure genes entailed in similar processes.
引用
收藏
页数:21
相关论文
共 50 条
  • [41] Predicting Fitness-Related Traits Using Gene Expression and Machine Learning
    Henry, Georgia A.
    Stinchcombe, John R.
    GENOME BIOLOGY AND EVOLUTION, 2025, 17 (02):
  • [42] High-accuracy prediction of colorectal cancer chemotherapy efficacy using machine learning applied to gene expression data
    Amniouel, Soukaina
    Jafri, Mohsin Saleet
    FRONTIERS IN PHYSIOLOGY, 2024, 14
  • [43] Improving the Prediction of Survival in Cancer Patients by Using Machine Learning Techniques: Experience of Gene Expression Data: A Narrative Review
    Bashiri, Azadeh
    Ghazisaeedi, Marjan
    Safdari, Reza
    Shahmoradi, Leila
    Ehtesham, Hamide
    IRANIAN JOURNAL OF PUBLIC HEALTH, 2017, 46 (02) : 165 - 172
  • [44] Prediction of Recurrence in Non Small Cell Lung Cancer Patients with Gene Expression Data Using Machine Learning Techniques
    Bhattacharjee, Sudipto
    Saha, Banani
    Saha, Sudipto
    2023 INTERNATIONAL CONFERENCE ON COMPUTER, ELECTRICAL & COMMUNICATION ENGINEERING, ICCECE, 2023,
  • [45] Classification of gene-expression data: The manifold-based metric learning way
    Lee, Jianguo
    Zhang, Changshui
    PATTERN RECOGNITION, 2006, 39 (12) : 2450 - 2463
  • [46] Classification of Microseismic Signals Using Machine Learning
    Chen, Ziyang
    Cui, Yi
    Pu, Yuanyuan
    Rui, Yichao
    Chen, Jie
    Mengli, Deren
    Yu, Bin
    PROCESSES, 2024, 12 (06)
  • [47] Lubrication Regime Classification of Hydrodynamic Journal Bearings by Machine Learning Using Torque Data
    Moder, Jakob
    Bergmann, Philipp
    Gruen, Florian
    LUBRICANTS, 2018, 6 (04):
  • [48] Microarray Gene Expression Data Classification via Wilcoxon Sign Rank Sum and Novel Grey Wolf Optimized Ensemble Learning Models
    Saheed, Yakub K.
    Balogun, Bukola F.
    Odunayo, Braimah Joseph
    Abdulsalam, Mustapha
    IEEE-ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS, 2023, 20 (06) : 3575 - 3587
  • [49] Rival penalized competitive learning (RPCL): a topology-determining algorithm for analyzing gene expression data
    Nair, TM
    Zheng, CL
    Fink, JL
    Stuart, RO
    Gribskov, M
    COMPUTATIONAL BIOLOGY AND CHEMISTRY, 2003, 27 (06) : 565 - 574
  • [50] Learning structure in gene expression data using deep architectures, with an application to gene clustering
    Gupta, Aman
    Wang, Haohan
    Ganapathiraju, Madhavi
    PROCEEDINGS 2015 IEEE INTERNATIONAL CONFERENCE ON BIOINFORMATICS AND BIOMEDICINE, 2015, : 1328 - 1335