DeepTRIAGE: interpretable and individualised biomarker scores using attention mechanism for the classification of breast cancer sub-types

被引:20
作者
Beykikhoshk, Adham [1 ]
Quinn, Thomas P. [1 ]
Lee, Samuel C. [1 ]
Truyen Tran [1 ]
Venkatesh, Svetha [1 ]
机构
[1] Deakin Univ, Ctr Pattern Recognit & Data Analyt, Geelong, Vic, Australia
关键词
Breast cancer; Precision medicine; TCGA; Deep learning; FEATURE-SELECTION; GENE; MACHINE;
D O I
10.1186/s12920-020-0658-5
中图分类号
Q3 [遗传学];
学科分类号
071007 ; 090102 ;
摘要
Background: Breast cancer is a collection of multiple tissue pathologies, each with a distinct molecular signature that correlates with patient prognosis and response to therapy. Accurately differentiating between breast cancer sub-types is an important part of clinical decision-making. Although this problem has been addressed using machine learning methods in the past, there remains unexplained heterogeneity within the established sub-types that cannot be resolved by the commonly used classification algorithms. Methods: In this paper, we propose a novel deep learning architecture, called DeepTRIAGE (Deep learning for the TRactable Individualised Analysis of Gene Expression), which uses an attention mechanism to obtain personalised biomarker scores that describe how important each gene is in predicting the cancer sub-type for each sample. We then perform a principal component analysis of these biomarker scores to visualise the sample heterogeneity, and use a linear model to test whether the major principal axes associate with known clinical phenotypes. Results: Our model not only classifies cancer sub-types with good accuracy, but simultaneously assigns each patient their own set of interpretable and individualised biomarker scores. These personalised scores describe how important each feature is in the classification of any patient, and can be analysed post-hoc to generate new hypotheses about latent heterogeneity. Conclusions: We apply the DeepTRIAGE framework to classify the gene expression signatures of luminal A and luminal B breast cancer sub-types, and illustrate its use for genes as well as the GO and KEGG gene sets. Using DeepTRIAGE, we calculate personalised biomarker scores that describe the most important features for classifying an individual patient as luminal A or luminal B. In doing so, DeepTRIAGE simultaneously reveals heterogeneity within the luminal A biomarker scores that significantly associate with tumour stage, placing all luminal samples along a continuum of severity.
引用
收藏
页数:10
相关论文
共 31 条
  • [1] Anders S., 2010, GENOME BIOL, V11, pR106, DOI [10.1186/gb-2010-11-10-r106, DOI 10.1186/gb-2010-11-10-r106]
  • [2] [Anonymous], BMC BIOINFORMATICS
  • [3] [Anonymous], ARXIV161109340
  • [4] [Anonymous], 2017, P 31 INT C NEURAL IN
  • [5] Bahdanau D., 2014, 3 INT C LEARN REPR
  • [6] Bair E., 2003, SIGKDD EXPLORATIONS, V5, P48
  • [7] Clustering gene expression patterns
    Ben-Dor, A
    Shamir, R
    Yakhini, Z
    [J]. JOURNAL OF COMPUTATIONAL BIOLOGY, 1999, 6 (3-4) : 281 - 297
  • [8] Bingham E., 2001, KDD-2001. Proceedings of the Seventh ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, P245, DOI 10.1145/502512.502546
  • [9] Classification of lung cancer using ensemble-based feature selection and machine learning methods
    Cai, Zhihua
    Xu, Dong
    Zhang, Qing
    Zhang, Jiexia
    Ngai, Sai-Ming
    Shao, Jianlin
    [J]. MOLECULAR BIOSYSTEMS, 2015, 11 (03) : 791 - 800
  • [10] TCGAbiolinks: an R/Bioconductor package for integrative analysis of TCGA data
    Colaprico, Antonio
    Silva, Tiago C.
    Olsen, Catharina
    Garofano, Luciano
    Cava, Claudia
    Garolini, Davide
    Sabedot, Thais S.
    Malta, Tathiane M.
    Pagnotta, Stefano M.
    Castiglioni, Isabella
    Ceccarelli, Michele
    Bontempi, Gianluca
    Noushmehr, Houtan
    [J]. NUCLEIC ACIDS RESEARCH, 2016, 44 (08) : e71