Indexing Next-Generation Sequencing data

被引:6
|
作者
Jalili, Vahid [1 ]
Matteucci, Matteo [1 ]
Masseroli, Marco [1 ]
Ceri, Stefano [1 ]
机构
[1] Politecn Milan, DEIB, Piazza Leonardo da Vinci 32, Milan, Italy
关键词
Genomic computing; Domain-specific data indexing; Region-based operations and calculus; Data integration; GENOMIC FEATURES; TEMPORAL DATA; COMPUTATION; OPERATIONS; ALGORITHM; BROWSER; SEARCH; TREES; QUERY;
D O I
10.1016/j.ins.2016.08.085
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Next-Generation Sequencing (NGS), also known as high-throughput sequencing, has opened the possibility of a comprehensive characterization of the genomic and epigenomic landscapes, giving answers to fundamental questions for biological and clinical research, e.g., how DNA-protein interactions and chromatin structure affect gene activity, how cancer develops, how much complex diseases such as diabetes or cancer depend on personal (epi)genomic traits, opening the road to personalized and precision medicine. In this context, our research has focused on sense-making, e.g., discovering how heterogeneous DNA regions concur to determine particular biological processes or phenotypes. Towards such discovery, characteristic operations to be performed on region data regard identifying co-occurrences of regions, from different biological tests and/or of distinct semantic types, possibly within a certain distance from each others and/or from DNA regions with known structural or functional properties. In this paper, we present Di3, a 1D Interval Inverted Index, acting as a multi-resolution single-dimension data structure for interval-based data queries. Di3 is defined at data access layer, independent from data layer, business logic layer, and presentation layer; this design makes Di3 adaptable to any underlying persistence technology based on key-value pairs, spanning from classical B+ tree to LevelDB and Apache HBase, and makes Di3 suitable for different business logic and presentation layer scenarios. We demonstrate the effectiveness of Di3 as a general purpose genomic region manipulation tool, with a console-level interface, and as a software component used within MuSERA, a tool for comparative analysis of region data replicates from NGS ChIP-seq and DNase-seq tests. (C) 2016 Elsevier Inc. All rights reserved.
引用
收藏
页码:90 / 109
页数:20
相关论文
共 50 条
  • [41] SeedsGraph: an efficient assembler for next-generation sequencing data
    Chunyu Wang
    Maozu Guo
    Xiaoyan Liu
    Yang Liu
    Quan Zou
    BMC Medical Genomics, 8
  • [42] The Promises and Pitfalls of Next-Generation Sequencing Data in Phylogeography
    Carstens, Bryan
    Lemmon, Alan R.
    Lemmon, Emily Moriarty
    SYSTEMATIC BIOLOGY, 2012, 61 (05) : 713 - 715
  • [43] Model Testing of PluriTest with Next-Generation Sequencing Data
    Schulze, Markus
    Hoja, Sabine
    Winner, Beate
    Winkler, Juergen
    Edenhofer, Frank
    Riemenschneider, Markus J.
    STEM CELLS AND DEVELOPMENT, 2016, 25 (07) : 569 - 571
  • [44] The Genome Assembly Model for Next-Generation Sequencing Data
    Wang, Yirong
    Wei, Chengdong
    Zhang, Xiaodong
    Cen, Tailin
    PROCEEDINGS OF THE 2017 INTERNATIONAL CONFERENCE ON APPLIED MATHEMATICS, MODELLING AND STATISTICS APPLICATION (AMMSA 2017), 2017, 141 : 97 - 101
  • [45] Next-Generation Anchor Based Phylogeny (NexABP): Constructing phylogeny from Next-generation sequencing data
    Tanmoy Roychowdhury
    Anchal Vishnoi
    Alok Bhattacharya
    Scientific Reports, 3
  • [46] Next-Generation Anchor Based Phylogeny (NexABP): Constructing phylogeny from Next-generation sequencing data
    Roychowdhury, Tanmoy
    Vishnoi, Anchal
    Bhattacharya, Alok
    SCIENTIFIC REPORTS, 2013, 3
  • [47] HUMAN DISEASE Next-generation sequencing of the next generation
    Burgess, Darren J.
    NATURE REVIEWS GENETICS, 2011, 12 (02) : 78 - 79
  • [48] Next-generation sequencing in epigenetics
    Zeschnigk, Michael
    Horsthemke, Bernhard
    MEDIZINISCHE GENETIK, 2019, 31 (02) : 205 - 211
  • [49] The chemistry of next-generation sequencing
    Raphaël Rodriguez
    Yamuna Krishnan
    Nature Biotechnology, 2023, 41 : 1709 - 1715
  • [50] Next-generation sequencing in the clinic
    Jason Y Park
    Larry J Kricka
    Paolo Fortina
    Nature Biotechnology, 2013, 31 : 990 - 992