Dissecting Crucial Gene Markers Involved in HPV-Associated Oropharyngeal Squamous Cell Carcinoma from RNA-Sequencing Data through Explainable Artificial Intelligence

被引:1
|
作者
Sekaran, Karthik [1 ]
Varghese, Rinku Polachirakkal [1 ]
Krishnan, Sasikumar [2 ]
Zayed, Hatem [3 ]
El Allali, Achraf [4 ]
Doss, George Priya C. [1 ]
机构
[1] Vellore Inst Technol, Sch Biosci & Technol, Vellore 632014, India
[2] Vellore Inst Technol, Sch Elect Engn, Dept Sensor & Biomed Technol, Vellore 632014, India
[3] Qatar Univ, Coll Hlth Sci, Dept Biomed Sci, QU Hlth, Doha 2713, Qatar
[4] Mohammed VI Polytech Univ, Coll Comp, Bioinformat Lab, Ben Guerir 43150, Morocco
来源
FRONTIERS IN BIOSCIENCE-LANDMARK | 2024年 / 29卷 / 06期
关键词
biomarker discovery; explainable artificial intelligence; human papillomavirus; oropharyngeal squamous cell carcinoma; RNA-sequencing; shapley additive explanations; HUMAN-PAPILLOMAVIRUS; CANCER; HEAD; BIOMARKERS; EXPRESSION; DIAGNOSIS; RISK;
D O I
10.31083/j.fbl2906220
中图分类号
Q5 [生物化学]; Q7 [分子生物学];
学科分类号
071010 ; 081704 ;
摘要
Background : The incidence rate of oropharyngeal squamous cell carcinoma (OPSCC) worldwide is alarming. In the clinical community, there is a pressing necessity to comprehend the etiology of the OPSCC to facilitate the administration of effective treatments. Methods : This study confers an integrative genomics approach for identifying key oncogenic drivers involved in the OPSCC pathogenesis. The dataset contains RNA -Sequencing (RNA-Seq) samples of 46 Human papillomavirus-positive head and neck squamous cell carcinoma and 25 normal Uvulopalatopharyngoplasty cases. The differential marker selection is performed between the groups with a log2FoldChange (FC) score of 2, adjusted p -value < 0.01, and screened 714 genes. The Particle Swarm Optimization (PSO) algorithm selects the candidate gene subset, reducing the size to 73. The state-of-the-art machine learning algorithms are trained with the differentially expressed genes and candidate subsets of PSO. Results : The analysis of predictive models using Shapley Additive exPlanations revealed that seven genes significantly contribute to the model's performance. These include ECT2 , LAMC2 , and DSG2 , which predominantly influence differentiating between sample groups. They were followed in importance by FAT1 , PLOD2 , COL1A1 , and PLAU . The Random Forest and Bayes Net algorithms also achieved perfect validation scores when using PSO features. Furthermore, gene set enrichment analysis, protein -protein interactions, and disease ontology mining revealed a significant association between these genes and the target condition. As indicated by Shapley Additive exPlanations (SHAPs), the survival analysis of three key genes unveiled strong over -expression in the samples from "The Cancer Genome Atlas". Conclusions : Our findings elucidate critical oncogenic drivers in OPSCC, offering vital insights for developing targeted therapies and enhancing understanding its pathogenesis.
引用
收藏
页数:12
相关论文
共 1 条
  • [1] Infiltrating B-cell subtypes and associated hub genes in nasopharyngeal carcinoma identified from integrated single-cell, bulk RNA-sequencing, and immunohistochemical data
    Zhong, Fangyan
    Chen, Junjun
    Lu, Tianzhu
    Zhang, Lin
    Liu, Zhiliang
    Guan, Chunhong
    Xiong, Xiaopeng
    Gong, Xiaochang
    Li, Jingao
    HEREDITAS, 2025, 162 (01):