Alignment of High-Throughput Sequencing Data Inside In-Memory Databases

被引:3
|
作者
Firnkorn, Daniel [1 ]
Knaup-Gregori, Petra [1 ]
Bermejo, Justo Lorenzo [1 ]
Ganzinger, Matthias [1 ]
机构
[1] Inst Med Biometry & Informat, Heidelberg, Germany
来源
E-HEALTH - FOR CONTINUITY OF CARE | 2014年 / 205卷
关键词
In-Memory-Technology; DNA-Alignment; HANA; high-throughput sequencing; stored procedures;
D O I
10.3233/978-1-61499-432-9-476
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
In times of high-throughput DNA sequencing techniques, performance-capable analysis of DNA sequences is of high importance. Computer supported DNA analysis is still an intensive time-consuming task. In this paper we explore the potential of a new In-Memory database technology by using SAP's High Performance Analytic Appliance (HANA). We focus on read alignment as one of the first steps in DNA sequence analysis. In particular, we examined the widely used Burrows-Wheeler Aligner (BWA) and implemented stored procedures in both, HANA and the free database system MySQL, to compare execution time and memory management. To ensure that the results are comparable, MySQL has been running in memory as well, utilizing its integrated memory engine for database table creation. We implemented stored procedures, containing exact and inexact searching of DNA reads within the reference genome GRCh37. Due to technical restrictions in SAP HANA concerning recursion, the inexact matching problem could not be implemented on this platform. Hence, performance analysis between HANA and MySQL was made by comparing the execution time of the exact search procedures. Here, HANA was approximately 27 times faster than MySQL which means, that there is a high potential within the new In-Memory concepts, leading to further developments of DNA analysis procedures in the future.
引用
收藏
页码:476 / 480
页数:5
相关论文
共 50 条
  • [1] Genome variation discovery with high-throughput sequencing data
    Dalca, Adrian V.
    Brudno, Michael
    BRIEFINGS IN BIOINFORMATICS, 2010, 11 (01) : 3 - 14
  • [2] High-Throughput Sequencing and Metagenomics
    William J. Jones
    Estuaries and Coasts, 2010, 33 : 944 - 952
  • [3] High-Throughput Sequencing and Metagenomics
    Jones, William J.
    ESTUARIES AND COASTS, 2010, 33 (04) : 944 - 952
  • [4] fluff: exploratory analysis and visualization of high-throughput sequencing data
    Georgiou, Georgios
    van Heeringen, Simon J.
    PEERJ, 2016, 4
  • [5] ReSeq simulates realistic Illumina high-throughput sequencing data
    Schmeing, Stephan
    Robinson, Mark D.
    GENOME BIOLOGY, 2021, 22 (01)
  • [6] ReSeq simulates realistic Illumina high-throughput sequencing data
    Stephan Schmeing
    Mark D. Robinson
    Genome Biology, 22
  • [7] Prevention, diagnosis and treatment of high-throughput sequencing data pathologies
    Zhou, Xiaofan
    Rokas, Antonis
    MOLECULAR ECOLOGY, 2014, 23 (07) : 1679 - 1700
  • [8] High-throughput sequencing for algal systematics
    Oliveira, Mariana C.
    Repetti, Sonja I.
    Iha, Cintia
    Jackson, Christopher J.
    Diaz-Tapia, Pilar
    Lubiana, Karoline Magalhaes Ferreira
    Cassano, Valeria
    Costa, Joana F.
    Cremen, Ma Chiela M.
    Marcelino, Vanessa R.
    Verbruggen, Heroen
    EUROPEAN JOURNAL OF PHYCOLOGY, 2018, 53 (03) : 256 - 272
  • [9] High-throughput sequencing data and the impact of plant gene annotation quality
    Vaattovaara, Aleksia
    Leppala, Johanna
    Salojarvi, Jarkko
    Wrzaczek, Michael
    JOURNAL OF EXPERIMENTAL BOTANY, 2019, 70 (04) : 1069 - 1076
  • [10] DisCVR: Rapid viral diagnosis from high-throughput sequencing data
    Maabar, Maha
    Davison, Andrew J.
    Vucak, Matej
    Thorburn, Fiona
    Murcia, Pablo R.
    Gunson, Rory
    Palmarini, Massimo
    Hughes, Joseph
    VIRUS EVOLUTION, 2019, 5 (02)