Deep Learning Based Tumor Type Classification Using Gene Expression Data

被引：90

作者：

Lyu, Boyu ^{[1
]}

Haque, Anamul ^{[1
]}

机构：

[1] Virginia Tech, Blacksburg, VA 24061 USA

来源：

ACM-BCB'18: PROCEEDINGS OF THE 2018 ACM INTERNATIONAL CONFERENCE ON BIOINFORMATICS, COMPUTATIONAL BIOLOGY, AND HEALTH INFORMATICS | 2018年

关键词：

Deep Learning; Tumor Type Classification; Pan-Cancer Atlas; Convolutional Neural Network; B-CELL LYMPHOMA; CANCER;

D O I：

10.1145/3233547.3233588

中图分类号：

Q [生物科学];

学科分类号：

07 ; 0710 ; 09 ;

摘要：

The differential analysis is the most significant part of RNA-Seq analysis. Conventional methods of the differential analysis usually match the tumor samples to the normal samples, which are both from the same tumor type. Such method would fail in differentiating tumor types because it lacks the knowledge from other tumor types. The Pan-Cancer Atlas provides us with abundant information on 33 prevalent tumor types which could be used as prior knowledge to generate tumor-specific biomarkers. In this paper, we embedded the high dimensional RNA-Seq data into 2-D images and used a convolutional neural network to make classification of the 33 tumor types. The final accuracy we got was 95.59%. Furthermore, based on the idea of Guided Grad Cam, as to each class, we generated significance heat-map for all the genes. By doing functional analysis on the genes with high intensities in the heat-maps, we validated that these top genes are related to tumor-specific pathways, and some of them have already been used as biomarkers, which proved the effectiveness of our method. As far as we know, we are the first to apply a convolutional neural network on Pan-Cancer Atlas for the classification of tumor types, and we are also the first to use gene's contribution in classification to the importance of genes to identify candidate biomarkers. Our experiment results show that our method has a good performance and could also apply to other genomics data.

引用

页码：89 / 96

页数：8

共 20 条

[11]

Hu J, 2017, ARXIV PREPRINT ARXIV

[12] A comprehensive genomic pan-cancer classification using The Cancer Genome Atlas gene expression data [J].