Identifying candidate RNA-seq biomarkers for severity discrimination in chemical injuries: A machine learning and molecular dynamics approach

被引:0
|
作者
Arabfard, Masoud [1 ]
Behmard, Esmaeil [2 ]
Maghsoudloo, Mazaher [3 ]
Dadgar, Emad [4 ]
Parvin, Shahram [1 ]
Bagheri, Hasan [1 ]
机构
[1] Baqiyatallah Univ Med Sci, Syst Biol & Poisonings Inst, Chem Injuries Res Ctr, Tehran, Iran
[2] Fasa Univ Med Sci, Sch Adv Technol Med, Fasa, Iran
[3] Southwest Med Univ, Res Ctr Preclin Med, Key Lab Epigenet & Oncol, Luzhou 646000, Sichuan, Peoples R China
[4] Baqiyatallah Univ Med Sci, Students Res Comm, Tehran, Iran
关键词
Biomarkers; Machine Learning; RNA-Seq; Mustard Gas; Chemical injured; NEUTROPHILS; ALGORITHMS; MECHANISMS; MUSTARD; REPAIR; CXCR1;
D O I
10.1016/j.intimp.2025.114090
中图分类号
R392 [医学免疫学]; Q939.91 [免疫学];
学科分类号
100102 ;
摘要
Introduction: Biomarkers play a crucial role across various fields by providing insights into biological responses to interventions. High-throughput gene expression profiling technologies facilitate the discovery of data-driven biomarkers through extensive datasets. This study focuses on identifying biomarkers in gene expression data related to chemical injuries by mustard gas, covering a spectrum from healthy individuals to severe injuries. Materials and methods: The study utilized RNA-Seq data comprising 52 expression data samples for 54,583 gene transcripts. These samples were categorized into four classes based on the GOLD classification for chemically injured individuals: Severe (n = 14), Moderate (n = 11), Mild (n = 16), and healthy controls (n = 11). Data preparation involved examining an Excel file created in the R programming environment using MLSeq and devtools packages. Feature selection was performed using Genetic Algorithm and Simulated Annealing, with Random Forest algorithm employed for classification. Ab initio methods ensured computational efficiency and result accuracy, while molecular dynamics simulation acted as a virtual experiment bridging the gap between experimental and theoretical experiences. Results: A total of 12 models were created, each introducing a list of differentially expressed genes as potential biomarkers. The performance of models varied across group comparisons, with the Genetic Algorithm generally outperforming Simulated Annealing in most cases. For the Severe vs. Moderate group, GA achieved the best performance with an accuracy of 94.38%, recall of 91.64%, and specificity of 97.10%. The results highlight the effectiveness of GA in most group comparisons, while SA performed better in specific cases involving Moderate and Mild groups. These biomarkers were evaluated against the gene expression data to assess their expression changes between different groups of chemically injured individuals. Four genes were selected based on level expression for further investigation: CXCR1, EIF2B2, RAD51, and RXFP2. The expression levels of these genes were analyzed to determine their differential expression between the groups. Conclusion: This study was designed as a computational effort to identify diagnostic biomarkers in basic biological system research. Our findings proposed a list of discriminative biomarkers capable of distinguishing between different groups of chemically injured individuals. The identification of key genes highlights the potential for biomarkers to serve as indicators of chemical injury severity, warranting further investigation to validate their clinical relevance and utility in diagnosis and treatment.
引用
收藏
页数:13
相关论文
共 16 条
  • [11] Identifying In Vitro Cultured Human Hepatocytes Markers with Machine Learning Methods Based on Single-Cell RNA-Seq Data
    Li, ZhanDong
    Huang, FeiMing
    Chen, Lei
    Huang, Tao
    Cai, Yu-Dong
    FRONTIERS IN BIOENGINEERING AND BIOTECHNOLOGY, 2022, 10
  • [12] A Machine Learning Model for the Prediction of COVID-19 Severity Using RNA-Seq, Clinical, and Co-Morbidity Data
    Sethi, Sahil
    Shakyawar, Sushil
    Reddy, Athreya S.
    Patel, Jai Chand
    Guda, Chittibabu
    DIAGNOSTICS, 2024, 14 (12)
  • [13] Identification of potential biomarkers in cardiovascular calcification based on bioinformatics combined with single-cell RNA-seq and multiple machine learning analysis
    Guo, Bingchen
    Shi, Si
    Xiong, Jie
    Guo, Yutong
    Wang, Bo
    Bai, Liyan
    Qiu, Yi
    Li, Shucheng
    Gao, Dianyu
    Dong, Zengxiang
    Tu, Yingfeng
    CELLULAR SIGNALLING, 2025, 131
  • [14] A Hybrid Machine Learning and Network Analysis Approach Reveals Two Parkinson's Disease Subtypes from 115 RNA-Seq Post-Mortem Brain Samples
    Termine, Andrea
    Fabrizio, Carlo
    Strafella, Claudia
    Caputo, Valerio
    Petrosini, Laura
    Caltagirone, Carlo
    Cascella, Raffaella
    Giardina, Emiliano
    INTERNATIONAL JOURNAL OF MOLECULAR SCIENCES, 2022, 23 (05)
  • [15] CSI-GEP: A GPU-based unsupervised machine learning approach for recovering gene expression programs in atlas-scale single-cell RNA-seq data
    Liu, Xueying
    Chapple, Richard H.
    Bennett, Declan
    Wright, William C.
    Sanjali, Ankita
    Culp, Erielle
    Zhang, Yinwen
    Pan, Min
    Geeleher, Paul
    CELL GENOMICS, 2025, 5 (01):
  • [16] Identifying 124 new anti-HIV drug candidates in a 37 billion-compound database: An integrated approach of machine learning (QSAR), molecular docking, and molecular dynamics simulation
    Cobre, Alexandre de Fatima
    Ara, Anderson
    Alves, Alexessander Couto
    Neto, Moises Maia
    Fachi, Mariana Millan
    Beca, Laize Silvia dos Anjos Botas
    Tonin, Fernanda Stumpf
    Pontarolo, Roberto
    CHEMOMETRICS AND INTELLIGENT LABORATORY SYSTEMS, 2024, 250