viGEN: An Open Source Pipeline for the Detection and Quantification of Viral RNA in Human Tumors

被引:18
作者
Bhuvaneshwar, Krithika [1 ]
Song, Lei [1 ]
Madhavan, Subha [1 ]
Gusev, Yuriy [1 ]
机构
[1] Georgetown Univ, Innovat Ctr Biomed Informat, Washington, DC 20057 USA
关键词
RNA-seq; viral detection; liver cancer; TCGA; variant analysis; next-generation sequencing; cancer immunology; ENDOGENOUS RETROVIRUS K; HEPATITIS-B; READ ALIGNMENT; VIRUS; CANCER; HERV-K113; SOFTWARE; IDENTIFY; MUTATION;
D O I
10.3389/fmicb.2018.01172
中图分类号
Q93 [微生物学];
学科分类号
071005 ; 100705 ;
摘要
An estimated 17% of cancers worldwide are associated with infectious causes. The extent and biological significance of viral presence/infection in actual tumor samples is generally unknown but could be measured using human transcriptome (RNA-seq) data from tumor samples. We present an open source bioinformatics pipeline viGEN, which allows for not only the detection and quantification of viral RNA, but also variants in the viral transcripts. The pipeline includes 4 major modules: The first module aligns and filter out human RNA sequences; the second module maps and count (remaining un-aligned) reads against reference genomes of all known and sequenced human viruses; the third module quantifies read counts at the individual viral-gene level thus allowing for downstream differential expression analysis of viral genes between case and controls groups. The fourth module calls variants in these viruses. To the best of our knowledge, there are no publicly available pipelines or packages that would provide this type of complete analysis in one open source package. In this paper, we applied the viGEN pipeline to two case studies. We first demonstrate the working of our pipeline on a large public dataset, the TOGA cervical cancer cohort. In the second case study, we performed an in-depth analysis on a small focused study of TOGA liver cancer patients. In the latter cohort, we performed viral-gene quantification, viral-variant extraction and survival analysis. This allowed us to find differentially expressed viral-transcripts and viral-variants between the groups of patients, and connect them to clinical outcome. From our analyses, we show that we were able to successfully detect the human papilloma virus among the TOGA cervical cancer patients. We compared the viGEN pipeline with two metagenomics tools and demonstrate similar sensitivity/specificity. We were also able to quantify viral-transcripts and extract viral-variants using the liver cancer dataset. The results presented corresponded with published literature in terms of rate of detection, and impact of several known variants of HBV genome. This pipeline is generalizable, and can be used to provide novel biological insights into microbial infections in complex diseases and tumorigeneses. Our viral pipeline could be used in conjunction with additional type of immuno-oncology analysis based on RNA-seq data of host RNA for cancer immunology applications. The source code, with example data and tutorial is available at: https://github.com/ICBI/viGEN/.
引用
收藏
页数:13
相关论文
共 55 条
  • [41] Infectious Entry Pathway Mediated by the Human Endogenous Retrovirus K Envelope Protein
    Robinson, Lindsey R.
    Whelan, Sean P. J.
    [J]. JOURNAL OF VIROLOGY, 2016, 90 (07) : 3640 - 3649
  • [42] edgeR: a Bioconductor package for differential expression analysis of digital gene expression data
    Robinson, Mark D.
    McCarthy, Davis J.
    Smyth, Gordon K.
    [J]. BIOINFORMATICS, 2010, 26 (01) : 139 - 140
  • [43] Viral expression associated with gastrointestinal adenocarcinomas in TCGA high-throughput sequencing data
    Salyakina, Daria
    Tsinoremas, Nicholas F.
    [J]. HUMAN GENOMICS, 2013, 7
  • [44] Sensitive Detection of Viral Transcripts in Human Tumor Transcriptomes
    Schelhorn, Sven-Eric
    Fischer, Matthias
    Tolosi, Laura
    Altmueller, Janine
    Nuernberg, Peter
    Pfister, Herbert
    Lengauer, Thomas
    Berthold, Frank
    [J]. PLOS COMPUTATIONAL BIOLOGY, 2013, 9 (10)
  • [45] Performance of Common Analysis Methods for Detecting Low-Frequency Single Nucleotide Variants in Targeted Next-Generation Sequence Data
    Spencer, David H.
    Tyagi, Manoj
    Vallania, Francesco
    Bredemeyer, Andrew J.
    Pfeifer, John D.
    Mitra, Rob D.
    Duncavage, Eric J.
    [J]. JOURNAL OF MOLECULAR DIAGNOSTICS, 2014, 16 (01) : 75 - 88
  • [46] IMMUNOSUPPRESSIVE TREATMENT OF HBSAG-POSITIVE CHRONIC LIVER-DISEASE - SIGNIFICANCE OF HBEAG
    TAGEJENSEN, U
    ALDERSHVILE, J
    SCHLICHTING, P
    [J]. HEPATOLOGY, 1985, 5 (01) : 47 - 49
  • [47] The Interface between Hepatitis B Virus Capsid Proteins Affects Self-Assembly, Pregenomic RNA Packaging, and Reverse Transcription
    Tan, Zhenning
    Pionek, Karolyn
    Unchwaniwala, Nuruddin
    Maguire, Megan L.
    Loeb, Daniel D.
    Zlotnick, Adam
    [J]. JOURNAL OF VIROLOGY, 2015, 89 (06) : 3275 - 3284
  • [48] The landscape of viral expression and host gene fusion and adaptation in human cancer
    Tang, Ka-Wei
    Alaei-Mahabadi, Babak
    Samuelsson, Tore
    Lindh, Magnus
    Larsson, Erik
    [J]. NATURE COMMUNICATIONS, 2013, 4
  • [49] Tran Tram T, 2011, Gastroenterol Hepatol (N Y), V7, P511
  • [50] Using Small RNA Deep Sequencing Data to Detect Human Viruses
    Wang, Fang
    Sun, Yu
    Ruan, Jishou
    Chen, Rui
    Chen, Xin
    Chen, Chengjie
    Kreuze, Jan F.
    Fei, ZhangJun
    Zhu, Xiao
    Gao, Shan
    [J]. BIOMED RESEARCH INTERNATIONAL, 2016, 2016