Splice Expression Variation Analysis (SEVA) for inter-tumor heterogeneity of gene isoform usage in cancer

被引:8
作者
Afsari, Bahman [1 ]
Guo, Theresa [2 ]
Considine, Michael [1 ]
Florea, Liliana [3 ]
Kagohara, Luciane T. [1 ]
Stein-O'Brien, Genevieve L. [1 ]
Kelley, Dylan [2 ]
Flam, Emily [2 ]
Zambo, Kristina D. [2 ]
Ha, Patrick K. [4 ]
Geman, Donald [5 ]
Ochs, Michael F. [6 ]
Califano, Joseph A. [7 ]
Gaykalova, Daria A. [2 ]
Favorov, Alexander V. [1 ,8 ]
Fertig, Elana J. [1 ]
机构
[1] Johns Hopkins Univ, Dept Oncol, Div Biostat & Bioinformat, Sidney Kimmel Comprehens Canc Ctr, Baltimore, MD 21205 USA
[2] Johns Hopkins Univ, Dept Otolaryngol Head & Neck Surg, Baltimore, MD 21205 USA
[3] Johns Hopkins Univ, McKusick Nathans Inst Genet Med, Baltimore, MD 21205 USA
[4] Univ Calif San Francisco, Dept Otolaryngol Head & Neck Surg, San Francisco, CA 94158 USA
[5] Johns Hopkins Univ, Dept Appl Math & Stat, Baltimore, MD 21218 USA
[6] Coll New Jersey, Dept Math & Stat, Ewing, NJ 08628 USA
[7] Univ Calif San Diego, Dept Surg, Div Otolaryngol, San Diego, CA 92093 USA
[8] RAS, Vavilov Inst Gen Genet, Lab Syst Biol & Computat Genet, Moscow 119333, Russia
基金
俄罗斯基础研究基金会; 美国国家科学基金会; 美国国家卫生研究院;
关键词
RNA-SEQ; RECONSTRUCTION; TRANSCRIPTOME; REVEALS;
D O I
10.1093/bioinformatics/bty004
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Motivation: Current bioinformatics methods to detect changes in gene isoform usage in distinct phenotypes compare the relative expected isoform usage in phenotypes. These statistics model differences in isoform usage in normal tissues, which have stable regulation of gene splicing. Pathological conditions, such as cancer, can have broken regulation of splicing that increases the heterogeneity of the expression of splice variants. Inferring events with such differential heterogeneity in gene isoform usage requires new statistical approaches. Results: We introduce Splice Expression Variability Analysis (SEVA) to model increased heterogeneity of splice variant usage between conditions (e.g. tumor and normal samples). SEVA uses a rank-based multivariate statistic that compares the variability of junction expression profiles within one condition to the variability within another. Simulated data show that SEVA is unique in modeling heterogeneity of gene isoform usage, and benchmark SEVA's performance against EBSeq, DiffSplice and rMATS that model differential isoform usage instead of heterogeneity. We confirm the accuracy of SEVA in identifying known splice variants in head and neck cancer and perform cross-study validation of novel splice variants. A novel comparison of splice variant heterogeneity between subtypes of head and neck cancer demonstrated unanticipated similarity between the heterogeneity of gene isoform usage in HPV-positive and HPV-negative subtypes and anticipated increased heterogeneity among HPV-negative samples with mutations in genes that regulate the splice variant machinery. These results show that SEVA accurately models differential heterogeneity of gene isoform usage from RNA-seq data.
引用
收藏
页码:1859 / 1867
页数:9
相关论文
共 34 条
[1]   RANK DISCRIMINANTS FOR PREDICTING PHENOTYPES FROM RNA EXPRESSION [J].
Afsari, Bahman ;
Braga-Neto, Ulisses M. ;
Geman, Donald .
ANNALS OF APPLIED STATISTICS, 2014, 8 (03) :1469-1491
[2]   Learning Dysregulated Pathways in Cancers from Differential Variability Analysis [J].
Afsari, Bahman ;
Geman, Donald ;
Fertig, Elana .
CANCER INFORMATICS, 2014, 13 :61-67
[3]   Leveraging transcript quantification for fast computation of alternative splicing profiles [J].
Alamancos, Gael P. ;
Pages, Amadis ;
Trincado, Juan L. ;
Bellora, Nicolas ;
Eyras, Eduardo .
RNA, 2015, 21 (09) :1521-1531
[4]   Detecting differential usage of exons from RNA-seq data [J].
Anders, Simon ;
Reyes, Alejandro ;
Huber, Wolfgang .
GENOME RESEARCH, 2012, 22 (10) :2008-2017
[5]  
[Anonymous], 1998, CAMBRIDGE SERIES STA
[6]  
[Anonymous], BMC BIOINFORMA
[7]   A comparison of normalization methods for high density oligonucleotide array data based on variance and bias [J].
Bolstad, BM ;
Irizarry, RA ;
Åstrand, M ;
Speed, TP .
BIOINFORMATICS, 2003, 19 (02) :185-193
[8]   CIDANE: comprehensive isoform discovery and abundance estimation [J].
Canzar, Stefan ;
Andreotti, Sandro ;
Weese, David ;
Reinert, Knut ;
Klau, Gunnar W. .
GENOME BIOLOGY, 2016, 17
[9]   Mutations in RNA Splicing Machinery in Human Cancers [J].
Ebert, Benjamin ;
Bernard, Olivier A. .
NEW ENGLAND JOURNAL OF MEDICINE, 2011, 365 (26) :2534-2535
[10]   Identifying Tightly Regulated and Variably Expressed Networks by Differential Rank Conservation (DIRAC) [J].
Eddy, James A. ;
Hood, Leroy ;
Price, Nathan D. ;
Geman, Donald .
PLOS COMPUTATIONAL BIOLOGY, 2010, 6 (05) :1-17