Distance indexing and seed clustering in sequence graphs

被引:11
作者
Chang, Xian [1 ]
Eizenga, Jordan [1 ]
Novak, Adam M. [1 ]
Siren, Jouni [1 ]
Paten, Benedict [1 ]
机构
[1] Univ Calif Santa Cruz, Dept Biomol Engn, Genom Inst, Santa Cruz, CA 95060 USA
基金
美国国家卫生研究院;
关键词
D O I
10.1093/bioinformatics/btaa446
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Motivation: Graph representations of genomes are capable of expressing more genetic variation and can therefore better represent a population than standard linear genomes. However, due to the greater complexity of genome graphs relative to linear genomes, some functions that are trivial on linear genomes become much more difficult in genome graphs. Calculating distance is one such function that is simple in a linear genome but complicated in a graph context. In read mapping algorithms such distance calculations are fundamental to determining if seed alignments could belong to the same mapping. Results: We have developed an algorithm for quickly calculating the minimum distance between positions on a sequence graph using a minimum distance index. We have also developed an algorithm that uses the distance index to cluster seeds on a graph. We demonstrate that our implementations of these algorithms are efficient and practical to use for a new generation of mapping algorithms based upon genome graphs.
引用
收藏
页码:146 / 153
页数:8
相关论文
共 20 条
  • [1] Akiba Takuya, 2013, P ACM SIGMOD INT C M, P349, DOI [DOI 10.1145/2463676.2465315, 10.1145/2463676, DOI 10.1145/2463676]
  • [2] Dave Vachik S., 2015, Database and Expert Systems Applications. 26th International Conference, DEXA 2015. Proceedings: LNCS 9261, P471, DOI 10.1007/978-3-319-22849-5_32
  • [3] Dijkstra E. W., 1959, NUMER MATH, V1, P1, DOI DOI 10.1007/BF01386390
  • [4] Djidjev HN, 1997, LECT NOTES COMPUT SC, V1197, P151
  • [5] Variation graph toolkit improves read mapping by representing genetic variation in the reference
    Garrison, Erik
    Siren, Jouni
    Novak, Adam M.
    Hickey, Glenn
    Eizenga, Jordan M.
    Dawson, Eric T.
    Jones, William
    Garg, Shilpa
    Markello, Charles
    Lin, Michael F.
    Paten, Benedict
    Durbin, Richard
    [J]. NATURE BIOTECHNOLOGY, 2018, 36 (09) : 875 - +
  • [6] A FORMAL BASIS FOR HEURISTIC DETERMINATION OF MINIMUM COST PATHS
    HART, PE
    NILSSON, NJ
    RAPHAEL, B
    [J]. IEEE TRANSACTIONS ON SYSTEMS SCIENCE AND CYBERNETICS, 1968, SSC4 (02): : 100 - +
  • [7] Jain C., 2019, LEIBNIZ INT P INFORM, V142
  • [8] Lauther Ulrich, 2004, Geoinformation und Mobilitaet-von der Forschung zur praktischen Anwendung, V22, P219
  • [9] Minimap and miniasm: fast mapping and de novo assembly for noisy long sequences
    Li, Heng
    [J]. BIOINFORMATICS, 2016, 32 (14) : 2103 - 2110
  • [10] Computational pan-genomics: status, promises and challenges
    Marschall, Tobias
    Marz, Manja
    Abeel, Thomas
    Dijkstra, Louis
    Dutilh, Bas E.
    Ghaffaari, Ali
    Kersey, Paul
    Kloosterman, Wigard P.
    Makinen, Veli
    Novak, Adam M.
    Paten, Benedict
    Porubsky, David
    Rivals, Eric
    Alkan, Can
    Baaijens, Jasmijn A.
    De Bakker, Paul I. W.
    Boeva, Valentina
    Bonnal, Raoul J. P.
    Chiaromonte, Francesca
    Chikhi, Rayan
    Ciccarelli, Francesca D.
    Cijvat, Robin
    Datema, Erwin
    Van Duijn, Cornelia M.
    Eichler, Evan E.
    Ernst, Corinna
    Eskin, Eleazar
    Garrison, Erik
    El-Kebir, Mohammed
    Klau, Gunnar W.
    Korbel, Jan O.
    Lameijer, Eric-Wubbo
    Langmead, Benjamin
    Martin, Marcel
    Medvedev, Paul
    Mu, John C.
    Neerincx, Pieter
    Ouwens, Klaasjan
    Peterlongo, Pierre
    Pisanti, Nadia
    Rahmann, Sven
    Raphael, Ben
    Reinert, Knut
    de Ridder, Dick
    de Ridder, Jeroen
    Schlesner, Matthias
    Schulz-Trieglaff, Ole
    Sanders, Ashley D.
    Sheikhizadeh, Siavash
    Shneider, Carl
    [J]. BRIEFINGS IN BIOINFORMATICS, 2018, 19 (01) : 118 - 135