The Utility of Data Transformation for Alignment, De Novo Assembly and Classification of Short Read Virus Sequences

被引:3
|
作者
Tapinos, Avraam [1 ]
Constantinides, Bede [1 ,2 ]
Phan, My V. T. [3 ]
Kouchaki, Samaneh [1 ,4 ]
Cotten, Matthew [3 ,5 ,6 ]
Robertson, David L. [1 ,5 ]
机构
[1] Univ Manchester, Sch Biol Sci, Manchester M13 9PT, Lancs, England
[2] Univ Oxford, John Radcliffe Hosp, Modernising Med Microbiol Consortium, Nuffield Dept Clin Med, Oxford OX3 9DU, England
[3] Erasmus MC, Dept Virosci, Doctor Molewaterpl 40, NL-3015 GD Rotterdam, Netherlands
[4] Univ Oxford, Inst Biomed Engn, Dept Engn Sci, Oxford OX3 7DQ, England
[5] MRC Univ Glasgow, Ctr Virus Res, Glasgow G61 1QH, Lanark, Scotland
[6] MRC UVRI & LSHTM Uganda Res Unit Entebbe, POB 49, Entebbe, Uganda
来源
VIRUSES-BASEL | 2019年 / 11卷 / 05期
基金
英国生物技术与生命科学研究理事会; 欧盟地平线“2020”; 英国惠康基金;
关键词
alignment; assembly; taxonomic classification; time series; data transformation; DWT; DFT; PAA; data compression; compressive genomics; TIME; ALGORITHM; DIMENSIONALITY; METAGENOMICS;
D O I
10.3390/v11050394
中图分类号
Q93 [微生物学];
学科分类号
071005 ; 100705 ;
摘要
Advances in DNA sequencing technology are facilitating genomic analyses of unprecedented scope and scale, widening the gap between our abilities to generate and fully exploit biological sequence data. Comparable analytical challenges are encountered in other data-intensive fields involving sequential data, such as signal processing, in which dimensionality reduction (i.e., compression) methods are routinely used to lessen the computational burden of analyses. In this work, we explored the application of dimensionality reduction methods to numerically represent high-throughput sequence data for three important biological applications of virus sequence data: reference-based mapping, short sequence classification and de novo assembly. Leveraging highly compressed sequence transformations to accelerate sequence comparison, our approach yielded comparable accuracy to existing approaches, further demonstrating its suitability for sequences originating from diverse virus populations. We assessed the application of our methodology using both synthetic and real viral pathogen sequences. Our results show that the use of highly compressed sequence approximations can provide accurate results, with analytical performance retained and even enhanced through appropriate dimensionality reduction of sequence data.
引用
收藏
页数:22
相关论文
共 50 条
  • [1] De novo assembly of nucleotide sequences in a compressed feature space
    Tapinos, Avraam
    Robertson, David L.
    2017 IEEE CONFERENCE ON COMPUTATIONAL INTELLIGENCE IN BIOINFORMATICS AND COMPUTATIONAL BIOLOGY (CIBCB), 2017, : 234 - 240
  • [2] Improving de novo Assembly Based on Read Classification
    Liao, Xingyu
    Li, Min
    Luo, Junwei
    Zou, You
    Wu, Fang-Xiang
    Pan, Yi
    Luo, Feng
    Wang, Jianxin
    IEEE-ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS, 2020, 17 (01) : 177 - 188
  • [3] Hybrid De Novo Genome Assembly Using MiSeq and SOLiD Short Read Data
    Ikegami, Tsutomu
    Inatsugi, Toyohiro
    Kojima, Isao
    Umemura, Myco
    Hagiwara, Hiroko
    Machida, Masayuki
    Asai, Kiyoshi
    PLOS ONE, 2015, 10 (04):
  • [4] Evaluating the Fidelity of De Novo Short Read Metagenomic Assembly Using Simulated Data
    Pignatelli, Miguel
    Moya, Andres
    PLOS ONE, 2011, 6 (05):
  • [5] Optimizing de novo assembly of short-read RNA-seq data for phylogenomics
    Yang, Ya
    Smith, Stephen A.
    BMC GENOMICS, 2013, 14
  • [6] Optimizing de novo assembly of short-read RNA-seq data for phylogenomics
    Ya Yang
    Stephen A Smith
    BMC Genomics, 14
  • [7] Assemblathon 1: A competitive assessment of de novo short read assembly methods
    Earl, Dent
    Bradnam, Keith
    St John, John
    Darling, Aaron
    Lin, Dawei
    Fass, Joseph
    Hung On Ken Yu
    Buffalo, Vince
    Zerbino, Daniel R.
    Diekhans, Mark
    Ngan Nguyen
    Ariyaratne, Pramila Nuwantha
    Sung, Wing-Kin
    Ning, Zemin
    Haimel, Matthias
    Simpson, Jared T.
    Fonseca, Nuno A.
    Birol, Inanc
    Docking, T. Roderick
    Ho, Isaac Y.
    Rokhsar, Daniel S.
    Chikhi, Rayan
    Lavenier, Dominique
    Chapuis, Guillaume
    Naquin, Delphine
    Maillet, Nicolas
    Schatz, Michael C.
    Kelley, David R.
    Phillippy, Adam M.
    Koren, Sergey
    Yang, Shiaw-Pyng
    Wu, Wei
    Chou, Wen-Chi
    Srivastava, Anuj
    Shaw, Timothy I.
    Ruby, J. Graham
    Skewes-Cox, Peter
    Betegon, Miguel
    Dimon, Michelle T.
    Solovyev, Victor
    Seledtsov, Igor
    Kosarev, Petr
    Vorobyev, Denis
    Ramirez-Gonzalez, Ricardo
    Leggett, Richard
    MacLean, Dan
    Xia, Fangfang
    Luo, Ruibang
    Li, Zhenyu
    Xie, Yinlong
    GENOME RESEARCH, 2011, 21 (12) : 2224 - 2241
  • [8] Effects of short read quality and quantity on a de novo vertebrate transcriptome assembly
    Garcia, T. I.
    Shen, Y.
    Catchen, J.
    Amores, A.
    Schartl, M.
    Postlethwait, J.
    Walter, R. B.
    COMPARATIVE BIOCHEMISTRY AND PHYSIOLOGY C-TOXICOLOGY & PHARMACOLOGY, 2012, 155 (01): : 95 - 101
  • [9] De novo assembly of human genomes with massively parallel short read sequencing
    Li, Ruiqiang
    Zhu, Hongmei
    Ruan, Jue
    Qian, Wubin
    Fang, Xiaodong
    Shi, Zhongbin
    Li, Yingrui
    Li, Shengting
    Shan, Gao
    Kristiansen, Karsten
    Li, Songgang
    Yang, Huanming
    Wang, Jian
    Wang, Jun
    GENOME RESEARCH, 2010, 20 (02) : 265 - 272
  • [10] Identifying wrong assemblies in de novo short read primary sequence assembly contigs
    Vandna Chawla
    Rajnish Kumar
    Ravi Shankar
    Journal of Biosciences, 2016, 41 : 455 - 474