The Utility of Data Transformation for Alignment, De Novo Assembly and Classification of Short Read Virus Sequences

被引:3
|
作者
Tapinos, Avraam [1 ]
Constantinides, Bede [1 ,2 ]
Phan, My V. T. [3 ]
Kouchaki, Samaneh [1 ,4 ]
Cotten, Matthew [3 ,5 ,6 ]
Robertson, David L. [1 ,5 ]
机构
[1] Univ Manchester, Sch Biol Sci, Manchester M13 9PT, Lancs, England
[2] Univ Oxford, John Radcliffe Hosp, Modernising Med Microbiol Consortium, Nuffield Dept Clin Med, Oxford OX3 9DU, England
[3] Erasmus MC, Dept Virosci, Doctor Molewaterpl 40, NL-3015 GD Rotterdam, Netherlands
[4] Univ Oxford, Inst Biomed Engn, Dept Engn Sci, Oxford OX3 7DQ, England
[5] MRC Univ Glasgow, Ctr Virus Res, Glasgow G61 1QH, Lanark, Scotland
[6] MRC UVRI & LSHTM Uganda Res Unit Entebbe, POB 49, Entebbe, Uganda
来源
VIRUSES-BASEL | 2019年 / 11卷 / 05期
基金
英国生物技术与生命科学研究理事会; 欧盟地平线“2020”; 英国惠康基金;
关键词
alignment; assembly; taxonomic classification; time series; data transformation; DWT; DFT; PAA; data compression; compressive genomics; TIME; ALGORITHM; DIMENSIONALITY; METAGENOMICS;
D O I
10.3390/v11050394
中图分类号
Q93 [微生物学];
学科分类号
071005 ; 100705 ;
摘要
Advances in DNA sequencing technology are facilitating genomic analyses of unprecedented scope and scale, widening the gap between our abilities to generate and fully exploit biological sequence data. Comparable analytical challenges are encountered in other data-intensive fields involving sequential data, such as signal processing, in which dimensionality reduction (i.e., compression) methods are routinely used to lessen the computational burden of analyses. In this work, we explored the application of dimensionality reduction methods to numerically represent high-throughput sequence data for three important biological applications of virus sequence data: reference-based mapping, short sequence classification and de novo assembly. Leveraging highly compressed sequence transformations to accelerate sequence comparison, our approach yielded comparable accuracy to existing approaches, further demonstrating its suitability for sequences originating from diverse virus populations. We assessed the application of our methodology using both synthetic and real viral pathogen sequences. Our results show that the use of highly compressed sequence approximations can provide accurate results, with analytical performance retained and even enhanced through appropriate dimensionality reduction of sequence data.
引用
收藏
页数:22
相关论文
共 50 条
  • [41] Overlap graphs and de Bruijn graphs: data structures for de novo genome assembly in the big data era
    Rizzi, Raffaella
    Beretta, Stefano
    Patterson, Murray
    Pirola, Yuri
    Previtali, Marco
    Della Vedova, Gianluca
    Bonizzoni, Paola
    QUANTITATIVE BIOLOGY, 2019, 7 (04) : 278 - 292
  • [42] Using GPUs for the Exact Alignment of Short-Read Genetic Sequences by Means of the Burrows-Wheeler Transform
    Salavert Torres, Jose
    Blanquer Espert, Ignacio
    Tomas Dominguez, Andres
    Hernamdez Garcia, Vicente
    Medina Castello, Ignacio
    Tarraga Gimenez, Joaquin
    Dopazo Blazquez, Joaquin
    IEEE-ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS, 2012, 9 (04) : 1245 - 1256
  • [43] Genetic Basis of Dorper Sheep (Ovis aries) Revealed by Long-Read De Novo Genome Assembly
    Qiao, Guoyan
    Xu, Pan
    Guo, Tingting
    Wu, Yi
    Lu, Xiaofang
    Zhang, Qingfeng
    He, Xue
    Zhu, Shaohua
    Zhao, Hongchang
    Lei, Zhihui
    Sun, Weibo
    Yang, Bohui
    Yue, Yaojing
    FRONTIERS IN GENETICS, 2022, 13
  • [44] Complete de novo assembly of Wolbachia endosymbiont of Drosophila willistoni using long-read genome sequencing
    Jacobs, Jodie
    Nakamoto, Anne
    Mastoras, Mira
    Loucks, Hailey
    Mirchandani, Cade
    Karim, Lily
    Penunuri, Gabriel
    Wanket, Ciara
    Russell, Shelbi L.
    SCIENTIFIC REPORTS, 2024, 14 (01):
  • [45] Non-referenced genome assembly from epigenomic short-read data
    Kaspi, Antony
    Ziemann, Mark
    Keating, Samuel T.
    Khurana, Ishant
    Connor, Timothy
    Spolding, Briana
    Cooper, Adrian
    Lazarus, Ross
    Walder, Ken
    Zimmet, Paul
    El-Osta, Assam
    EPIGENETICS, 2014, 9 (10) : 1329 - 1338
  • [46] A survey of the complex transcriptome from the highly polyploid sugarcane genome using full-length isoform sequencing and de novo assembly from short read sequencing
    Nam V. Hoang
    Agnelo Furtado
    Patrick J. Mason
    Annelie Marquardt
    Lakshmi Kasirajan
    Prathima P. Thirugnanasambandam
    Frederik C. Botha
    Robert J. Henry
    BMC Genomics, 18
  • [47] Pebble and Rock Band: Heuristic Resolution of Repeats and Scaffolding in the Velvet Short-Read de Novo Assembler
    Zerbino, Daniel R.
    McEwen, Gayle K.
    Margulies, Elliott H.
    Birney, Ewan
    PLOS ONE, 2009, 4 (12):
  • [48] ContigExtender: a new approach to improving de novo sequence assembly for viral metagenomics data
    Deng, Zachary
    Delwart, Eric
    BMC BIOINFORMATICS, 2021, 22 (01)
  • [49] ContigExtender: a new approach to improving de novo sequence assembly for viral metagenomics data
    Zachary Deng
    Eric Delwart
    BMC Bioinformatics, 22
  • [50] Finding the right fit: evaluation of short- read and long- read sequencing approaches to maximize the utility of clinical microbiome data
    Gehrig, Jeanette L.
    Portik, Daniel M.
    Driscoll, Mark D.
    Jackson, Eric
    Chakraborty, Shreyasee
    Gratalo, Dawn
    Ashby, Meredith
    Valladares, Ricardo
    MICROBIAL GENOMICS, 2022, 8 (03):