Indexing Next-Generation Sequencing data

被引:6
|
作者
Jalili, Vahid [1 ]
Matteucci, Matteo [1 ]
Masseroli, Marco [1 ]
Ceri, Stefano [1 ]
机构
[1] Politecn Milan, DEIB, Piazza Leonardo da Vinci 32, Milan, Italy
关键词
Genomic computing; Domain-specific data indexing; Region-based operations and calculus; Data integration; GENOMIC FEATURES; TEMPORAL DATA; COMPUTATION; OPERATIONS; ALGORITHM; BROWSER; SEARCH; TREES; QUERY;
D O I
10.1016/j.ins.2016.08.085
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Next-Generation Sequencing (NGS), also known as high-throughput sequencing, has opened the possibility of a comprehensive characterization of the genomic and epigenomic landscapes, giving answers to fundamental questions for biological and clinical research, e.g., how DNA-protein interactions and chromatin structure affect gene activity, how cancer develops, how much complex diseases such as diabetes or cancer depend on personal (epi)genomic traits, opening the road to personalized and precision medicine. In this context, our research has focused on sense-making, e.g., discovering how heterogeneous DNA regions concur to determine particular biological processes or phenotypes. Towards such discovery, characteristic operations to be performed on region data regard identifying co-occurrences of regions, from different biological tests and/or of distinct semantic types, possibly within a certain distance from each others and/or from DNA regions with known structural or functional properties. In this paper, we present Di3, a 1D Interval Inverted Index, acting as a multi-resolution single-dimension data structure for interval-based data queries. Di3 is defined at data access layer, independent from data layer, business logic layer, and presentation layer; this design makes Di3 adaptable to any underlying persistence technology based on key-value pairs, spanning from classical B+ tree to LevelDB and Apache HBase, and makes Di3 suitable for different business logic and presentation layer scenarios. We demonstrate the effectiveness of Di3 as a general purpose genomic region manipulation tool, with a console-level interface, and as a software component used within MuSERA, a tool for comparative analysis of region data replicates from NGS ChIP-seq and DNase-seq tests. (C) 2016 Elsevier Inc. All rights reserved.
引用
收藏
页码:90 / 109
页数:20
相关论文
共 50 条
  • [31] Computational classification of microRNAs in next-generation sequencing data
    Riback, Joshua
    Hatzigeorgiou, Artemis G.
    Reczko, Martin
    THEORETICAL CHEMISTRY ACCOUNTS, 2010, 125 (3-6) : 637 - 642
  • [32] Next-generation sequencing data analysis on cloud computing
    Kwon, Taesoo
    Yoo, Won Gi
    Lee, Won-Ja
    Kim, Won
    Kim, Dae-Won
    GENES & GENOMICS, 2015, 37 (06) : 489 - 501
  • [33] Qualimap: evaluating next-generation sequencing alignment data
    Garcia-Alcalde, Fernando
    Okonechnikov, Konstantin
    Carbonell, Jose
    Cruz, Luis M.
    Goetz, Stefan
    Tarazona, Sonia
    Dopazo, Joaquin
    Meyer, Thomas F.
    Conesa, Ana
    BIOINFORMATICS, 2012, 28 (20) : 2678 - 2679
  • [34] SeedsGraph: an efficient assembler for next-generation sequencing data
    Wang, Chunyu
    Guo, Maozu
    Liu, Xiaoyan
    Liu, Yang
    Zou, Quan
    BMC MEDICAL GENOMICS, 2015, 8
  • [35] Zseq: An Approach for Preprocessing Next-Generation Sequencing Data
    Alkhateeb, Abedalrhman
    Rueda, Luis
    JOURNAL OF COMPUTATIONAL BIOLOGY, 2017, 24 (08) : 746 - 755
  • [36] Extending KNIME for next-generation sequencing data analysis
    Jagla, Bernd
    Wiswedel, Bernd
    Coppee, Jean-Yves
    BIOINFORMATICS, 2011, 27 (20) : 2907 - 2909
  • [37] NGSphy: phylogenomic simulation of next-generation sequencing data
    Escalona, Merly
    Rocha, Sara
    Posada, David
    BIOINFORMATICS, 2018, 34 (14) : 2506 - 2507
  • [38] Next-generation sequencing data analysis on cloud computing
    Taesoo Kwon
    Won Gi Yoo
    Won-Ja Lee
    Won Kim
    Dae-Won Kim
    Genes & Genomics, 2015, 37 : 489 - 501
  • [39] IntSIM: An Integrated Simulator of Next-Generation Sequencing Data
    Yuan, Xiguo
    Zhang, Junying
    Yang, Liying
    IEEE TRANSACTIONS ON BIOMEDICAL ENGINEERING, 2017, 64 (02) : 441 - 451
  • [40] Computational classification of microRNAs in next-generation sequencing data
    Joshua Riback
    Artemis G. Hatzigeorgiou
    Martin Reczko
    Theoretical Chemistry Accounts, 2010, 125 : 637 - 642