The Utility of Data Transformation for Alignment, De Novo Assembly and Classification of Short Read Virus Sequences

被引:3
|
作者
Tapinos, Avraam [1 ]
Constantinides, Bede [1 ,2 ]
Phan, My V. T. [3 ]
Kouchaki, Samaneh [1 ,4 ]
Cotten, Matthew [3 ,5 ,6 ]
Robertson, David L. [1 ,5 ]
机构
[1] Univ Manchester, Sch Biol Sci, Manchester M13 9PT, Lancs, England
[2] Univ Oxford, John Radcliffe Hosp, Modernising Med Microbiol Consortium, Nuffield Dept Clin Med, Oxford OX3 9DU, England
[3] Erasmus MC, Dept Virosci, Doctor Molewaterpl 40, NL-3015 GD Rotterdam, Netherlands
[4] Univ Oxford, Inst Biomed Engn, Dept Engn Sci, Oxford OX3 7DQ, England
[5] MRC Univ Glasgow, Ctr Virus Res, Glasgow G61 1QH, Lanark, Scotland
[6] MRC UVRI & LSHTM Uganda Res Unit Entebbe, POB 49, Entebbe, Uganda
来源
VIRUSES-BASEL | 2019年 / 11卷 / 05期
基金
英国生物技术与生命科学研究理事会; 欧盟地平线“2020”; 英国惠康基金;
关键词
alignment; assembly; taxonomic classification; time series; data transformation; DWT; DFT; PAA; data compression; compressive genomics; TIME; ALGORITHM; DIMENSIONALITY; METAGENOMICS;
D O I
10.3390/v11050394
中图分类号
Q93 [微生物学];
学科分类号
071005 ; 100705 ;
摘要
Advances in DNA sequencing technology are facilitating genomic analyses of unprecedented scope and scale, widening the gap between our abilities to generate and fully exploit biological sequence data. Comparable analytical challenges are encountered in other data-intensive fields involving sequential data, such as signal processing, in which dimensionality reduction (i.e., compression) methods are routinely used to lessen the computational burden of analyses. In this work, we explored the application of dimensionality reduction methods to numerically represent high-throughput sequence data for three important biological applications of virus sequence data: reference-based mapping, short sequence classification and de novo assembly. Leveraging highly compressed sequence transformations to accelerate sequence comparison, our approach yielded comparable accuracy to existing approaches, further demonstrating its suitability for sequences originating from diverse virus populations. We assessed the application of our methodology using both synthetic and real viral pathogen sequences. Our results show that the use of highly compressed sequence approximations can provide accurate results, with analytical performance retained and even enhanced through appropriate dimensionality reduction of sequence data.
引用
收藏
页数:22
相关论文
共 50 条
  • [31] RepLong: de novo repeat identification using long read sequencing data
    Guo, Rui
    Li, Yan-Ran
    He, Shan
    Le Ou-Yang
    Sun, Yiwen
    Zhu, Zexuan
    BIOINFORMATICS, 2018, 34 (07) : 1099 - 1107
  • [32] Long-read sequencing and de novo genome assembly of Ammopiptanthus nanus, a desert shrub
    Gao, Fei
    Wang, Xue
    Li, Xuming
    Xu, Mingyue
    Li, Huayun
    Abla, Merhaba
    Sun, Huigai
    Wei, Shanjun
    Feng, Jinchao
    Zhou, Yijun
    GIGASCIENCE, 2018, 7 (07):
  • [33] Long-read sequencing and de novo genome assembly of marine medaka (Oryzias melastigma)
    Liang, Pingping
    Saqib, Hafiz Sohaib Ahmed
    Ni, Xiaomin
    Shen, Yingjia
    BMC GENOMICS, 2020, 21 (01)
  • [34] Parallelized short read assembly of large genomes using de Bruijn graphs
    Liu, Yongchao
    Schmidt, Bertil
    Maskell, Douglas L.
    BMC BIOINFORMATICS, 2011, 12
  • [35] Meraculous: De Novo Genome Assembly with Short Paired-End Reads
    Chapman, Jarrod A.
    Ho, Isaac
    Sunkara, Sirisha
    Luo, Shujun
    Schroth, Gary P.
    Rokhsar, Daniel S.
    PLOS ONE, 2011, 6 (08):
  • [36] MetaGT: A pipeline for de novo assembly of metatranscriptomes with the aid of metagenomic data
    Shafranskaya, Daria
    Kale, Varsha
    Finn, Rob
    Lapidus, Alla L.
    Korobeynikov, Anton
    Prjibelski, Andrey D.
    FRONTIERS IN MICROBIOLOGY, 2022, 13
  • [37] AftrRAD: a pipeline for accurate and efficient de novo assembly of RADseq data
    Sovic, Michael G.
    Fries, Anthony C.
    Gibbs, H. Lisle
    MOLECULAR ECOLOGY RESOURCES, 2015, 15 (05) : 1163 - 1171
  • [38] Robust high-throughput prokaryote de novo assembly and improvement pipeline for Illumina data
    Page, Andrew J.
    De Silva, Nishadi
    Hunt, Martin
    Quail, Michael A.
    Parkhill, Julian
    Harris, Simon R.
    Otto, Thomas D.
    Keane, Jacqueline A.
    MICROBIAL GENOMICS, 2016, 2 (08): : e000083
  • [39] taxMaps: comprehensive and highly accurate taxonomic classification of short-read data in reasonable time
    Corvelo, Andre
    Clarke, Wayne E.
    Robine, Nicolas
    Zody, Michael C.
    GENOME RESEARCH, 2018, 28 (05) : 751 - 758
  • [40] Genotyping and De Novo Discovery of Allelic Variants at the Brassicaceae Self-Incompatibility Locus from Short-Read Sequencing Data
    Genete, Mathieu
    Castric, Vincent
    Vekemans, Xavier
    MOLECULAR BIOLOGY AND EVOLUTION, 2020, 37 (04) : 1193 - 1201