OncoRTT: Predicting novel oncology-related therapeutic targets using BERT embeddings and omics features

被引:4
作者
Thafar, Maha A. [1 ,2 ]
Albaradei, Somayah [1 ,3 ]
Uludag, Mahmut [1 ]
Alshahrani, Mona [4 ]
Gojobori, Takashi [1 ]
Essack, Magbubah [1 ]
Gao, Xin [1 ]
机构
[1] King Abdullah Univ Sci & Technol KAUST, Comp Elect & Math Sci & Engn Div CEMSE, Computat Biosci Res Ctr, Comp CBRC, Thuwal, Saudi Arabia
[2] Taif Univ, Coll Comp & Informat Technol, Comp Sci Dept, Taif, Saudi Arabia
[3] King Abdulaziz Univ, Fac Comp & Informat Technol, Jeddah, Saudi Arabia
[4] Saudi Data & Artificial Intelligence Author SDAIA, Natl Ctr Artificial Intelligence NCAI, Riyadh, Saudi Arabia
关键词
machine learning; sequence embedding; omics; target identification; lung cancer; colon cancer; bioinformatics; deep neural network; DRUG; IDENTIFICATION; KNOWLEDGEBASE; EXPRESSION; BIOMARKERS; PROTEINS; SEQUENCE; NETWORK;
D O I
10.3389/fgene.2023.1139626
中图分类号
Q3 [遗传学];
学科分类号
071007 ; 090102 ;
摘要
Late-stage drug development failures are usually a consequence of ineffective targets. Thus, proper target identification is needed, which may be possible using computational approaches. The reason being, effective targets have disease-relevant biological functions, and omics data unveil the proteins involved in these functions. Also, properties that favor the existence of binding between drug and target are deducible from the protein's amino acid sequence. In this work, we developed OncoRTT, a deep learning (DL)-based method for predicting novel therapeutic targets. OncoRTT is designed to reduce suboptimal target selection by identifying novel targets based on features of known effective targets using DL approaches. First, we created the "OncologyTT" datasets, which include genes/proteins associated with ten prevalent cancer types. Then, we generated three sets of features for all genes: omics features, the proteins' amino-acid sequence BERT embeddings, and the integrated features to train and test the DL classifiers separately. The models achieved high prediction performances in terms of area under the curve (AUC), i.e., AUC greater than 0.88 for all cancer types, with a maximum of 0.95 for leukemia. Also, OncoRTT outperformed the state-of-the-art method using their data in five out of seven cancer types commonly assessed by both methods. Furthermore, OncoRTT predicts novel therapeutic targets using new test data related to the seven cancer types. We further corroborated these results with other validation evidence using the Open Targets Platform and a case study focused on the top-10 predicted therapeutic targets for lung cancer.
引用
收藏
页数:16
相关论文
共 93 条
  • [1] A Machine Learning Approach for Identifying Gene Biomarkers Guiding the Treatment of Breast Cancer
    Abou Tabl, Ashraf
    Alkhateeb, Abedalrhman
    ElMaraghy, Waguih
    Rueda, Luis
    Ngom, Alioune
    [J]. FRONTIERS IN GENETICS, 2019, 10
  • [2] Albaradei S., 2019, P 2019 6 INT C BIOIN
  • [3] MetastaSite: Predicting metastasis to different sites using deep learning with gene expression data
    Albaradei, Somayah
    Albaradei, Abdurhman
    Alsaedi, Asim
    Uludag, Mahmut
    Thafar, Maha A.
    Gojobori, Takashi
    Essack, Magbubah
    Gao, Xin
    [J]. FRONTIERS IN MOLECULAR BIOSCIENCES, 2022, 9
  • [4] Predicting Bone Metastasis Using Gene Expression-Based Machine Learning Models
    Albaradei, Somayah
    Uludag, Mahmut
    Thafar, Maha A.
    Gojobori, Takashi
    Essack, Magbubah
    Gao, Xin
    [J]. FRONTIERS IN GENETICS, 2021, 12
  • [5] Machine learning and deep learning methods that use omics data for metastasis prediction
    Albaradei, Somayah
    Thafar, Maha
    Alsaedi, Asim
    Van Neste, Christophe
    Gojobori, Takashi
    Essack, Magbubah
    Gao, Xin
    [J]. COMPUTATIONAL AND STRUCTURAL BIOTECHNOLOGY JOURNAL, 2021, 19 : 5008 - 5018
  • [6] MetaCancer: A deep learning-based pan-cancer metastasis prediction model developed using multi-omics data
    Albaradei, Somayah
    Napolitano, Francesco
    Thafar, Maha A.
    Gojobori, Takashi
    Essack, Magbubah
    Gao, Xin
    [J]. COMPUTATIONAL AND STRUCTURAL BIOTECHNOLOGY JOURNAL, 2021, 19 : 4404 - 4411
  • [7] Combining biomedical knowledge graphs and text to improve predictions for drug-target interactions and drug-indications
    Alshahrani, Mona
    Almansour, Abdullah
    Alkhaldi, Asma
    Thafar, Maha A.
    Uludag, Mahmut
    Essack, Magbubah
    Hoehndorf, Robert
    [J]. PEERJ, 2022, 10
  • [8] Application and evaluation of knowledge graph embeddings in biomedical data
    Alshahrani, Mona
    Thafar, Maha A.
    Essack, Magbubah
    [J]. PEERJ COMPUTER SCIENCE, 2021, 7 : 1 - 28
  • [9] Althubaiti S., 2021, DEEPMOCCA PAN CANC P
  • [10] Ontology-based prediction of cancer driver genes
    Althubaiti, Sara
    Karwath, Andreas
    Dallol, Ashraf
    Noor, Adeeb
    Alkhayyat, Shadi Salem
    Alwassia, Rolina
    Mineta, Katsuhiko
    Gojobori, Takashi
    Beggs, Andrew D.
    Schofield, Paul N.
    Gkoutos, Georgios, V
    Hoehndorf, Robert
    [J]. SCIENTIFIC REPORTS, 2019, 9 (1)