The Utility of Data Transformation for Alignment, De Novo Assembly and Classification of Short Read Virus Sequences

被引:3
|
作者
Tapinos, Avraam [1 ]
Constantinides, Bede [1 ,2 ]
Phan, My V. T. [3 ]
Kouchaki, Samaneh [1 ,4 ]
Cotten, Matthew [3 ,5 ,6 ]
Robertson, David L. [1 ,5 ]
机构
[1] Univ Manchester, Sch Biol Sci, Manchester M13 9PT, Lancs, England
[2] Univ Oxford, John Radcliffe Hosp, Modernising Med Microbiol Consortium, Nuffield Dept Clin Med, Oxford OX3 9DU, England
[3] Erasmus MC, Dept Virosci, Doctor Molewaterpl 40, NL-3015 GD Rotterdam, Netherlands
[4] Univ Oxford, Inst Biomed Engn, Dept Engn Sci, Oxford OX3 7DQ, England
[5] MRC Univ Glasgow, Ctr Virus Res, Glasgow G61 1QH, Lanark, Scotland
[6] MRC UVRI & LSHTM Uganda Res Unit Entebbe, POB 49, Entebbe, Uganda
来源
VIRUSES-BASEL | 2019年 / 11卷 / 05期
基金
英国生物技术与生命科学研究理事会; 欧盟地平线“2020”; 英国惠康基金;
关键词
alignment; assembly; taxonomic classification; time series; data transformation; DWT; DFT; PAA; data compression; compressive genomics; TIME; ALGORITHM; DIMENSIONALITY; METAGENOMICS;
D O I
10.3390/v11050394
中图分类号
Q93 [微生物学];
学科分类号
071005 ; 100705 ;
摘要
Advances in DNA sequencing technology are facilitating genomic analyses of unprecedented scope and scale, widening the gap between our abilities to generate and fully exploit biological sequence data. Comparable analytical challenges are encountered in other data-intensive fields involving sequential data, such as signal processing, in which dimensionality reduction (i.e., compression) methods are routinely used to lessen the computational burden of analyses. In this work, we explored the application of dimensionality reduction methods to numerically represent high-throughput sequence data for three important biological applications of virus sequence data: reference-based mapping, short sequence classification and de novo assembly. Leveraging highly compressed sequence transformations to accelerate sequence comparison, our approach yielded comparable accuracy to existing approaches, further demonstrating its suitability for sequences originating from diverse virus populations. We assessed the application of our methodology using both synthetic and real viral pathogen sequences. Our results show that the use of highly compressed sequence approximations can provide accurate results, with analytical performance retained and even enhanced through appropriate dimensionality reduction of sequence data.
引用
收藏
页数:22
相关论文
共 50 条
  • [21] Spaced Seed Data Structures for De Novo Assembly
    Birol, Inanc
    Chu, Justin
    Mohamadi, Hamid
    Jackman, Shaun D.
    Raghavan, Karthika
    Vandervalk, Benjamin P.
    Raymond, Anthony
    Warren, Rene L.
    INTERNATIONAL JOURNAL OF GENOMICS, 2015, 2015
  • [22] De novo transcriptome assembly: A comprehensive cross-species comparison of short-read RNA-Seq assemblers
    Hoelzer, Martin
    Marz, Manja
    GIGASCIENCE, 2019, 8 (05):
  • [23] GPU acceleration of Darwin read overlapper for de novo assembly of long DNA reads
    Ahmed, Nauman
    Qiu, Tong Dong
    Bertels, Koen
    Al-Ars, Zaid
    BMC BIOINFORMATICS, 2020, 21 (Suppl 13)
  • [24] Computational complexity of algorithms for sequence comparison, short-read assembly and genome alignment
    Baichoo, Shakuntala
    Ouzounis, Christos A.
    BIOSYSTEMS, 2017, 156 : 72 - 85
  • [25] De novo assembly of the common marmoset transcriptome from NextGen mRNA sequences
    Maudhoo, Mnirnal D.
    Ren, Dongren
    Gradnigo, Julien S.
    Gibbs, Robert M.
    Lubker, Austin C.
    Moriyama, Etsuko N.
    French, Jeffrey A.
    Norgren, Robert B., Jr.
    GIGASCIENCE, 2014, 3
  • [26] Minimap and miniasm: fast mapping and de novo assembly for noisy long sequences
    Li, Heng
    BIOINFORMATICS, 2016, 32 (14) : 2103 - 2110
  • [27] De novo assembly and analysis of RNA-seq data
    Robertson, Gordon
    Schein, Jacqueline
    Chiu, Readman
    Corbett, Richard
    Field, Matthew
    Jackman, Shaun D.
    Mungall, Karen
    Lee, Sam
    Okada, Hisanaga Mark
    Qian, Jenny Q.
    Griffith, Malachi
    Raymond, Anthony
    Thiessen, Nina
    Cezard, Timothee
    Butterfield, Yaron S.
    Newsome, Richard
    Chan, Simon K.
    She, Rong
    Varhol, Richard
    Kamoh, Baljit
    Prabhu, Anna-Liisa
    Tam, Angela
    Zhao, YongJun
    Moore, Richard A.
    Hirst, Martin
    Marra, Marco A.
    Jones, Steven J. M.
    Hoodless, Pamela A.
    Birol, Inanc
    NATURE METHODS, 2010, 7 (11) : 909 - U62
  • [28] MuffinEc: Error correction for de Novo assembly via greedy partitioning and sequence alignment
    Alic, Andy S.
    Tomas, Andres
    Medina, Ignacio
    Blanquer, Ignacio
    INFORMATION SCIENCES, 2016, 329 : 206 - 219
  • [29] De novo assembly of highly polymorphic metagenomic data using in situ generated reference sequences and a novel BLAST-based assembly pipeline
    You-Yu Lin
    Chia-Hung Hsieh
    Jiun-Hong Chen
    Xuemei Lu
    Jia-Horng Kao
    Pei-Jer Chen
    Ding-Shinn Chen
    Hurng-Yi Wang
    BMC Bioinformatics, 18
  • [30] De novo assembly of highly polymorphic metagenomic data using in situ generated reference sequences and a novel BLAST-based assembly pipeline
    Lin, You-Yu
    Hsieh, Chia-Hung
    Chen, Jiun-Hong
    Lu, Xuemei
    Kao, Jia-Horng
    Chen, Pei-Jer
    Chen, Ding-Shinn
    Wang, Hurng-Yi
    BMC BIOINFORMATICS, 2017, 18