CRISPR Detection From Short Reads Using Partial Overlap Graphs

被引:4
作者
Ben-Bassat, Ilan [1 ]
Chor, Benny [1 ]
机构
[1] Tel Aviv Univ, Blavatnik Sch Comp Sci, Levanon St, IL-69978 Tel Aviv, Israel
关键词
CRISPR detection; partial overlap graph; k-mer counting; filtering; sampling; IDENTIFICATION; BACTERIA; SYSTEM; GENE; TOOL;
D O I
10.1089/cmb.2015.0226
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Clustered regularly interspaced short palindromic repeats (CRISPR) are structured regions in bacterial and archaeal genomes, which are part of an adaptive immune system against phages. CRISPRs are important for many microbial studies and are playing an essential role in current gene editing techniques. As such, they attract substantial research interest. The exponential growth in the amount of bacterial sequence data in recent years enables the exploration of CRISPR loci in more and more species. Most of the automated tools that detect CRISPR loci rely on fully assembled genomes. However, many assemblers do not handle repetitive regions successfully. The first tool to work directly on raw sequence data is Crass, which requires reads that are long enough to contain two copies of the same repeat. We present a method to identify CRISPR repeats from raw sequence data of short reads. The algorithm is based on an observation differentiating CRISPR repeats from other types of repeats, and it involves a series of partial constructions of the overlap graph. This enables us to avoid many of the difficulties that assemblers face, as we merely aim to identify the repeats that belong to CRISPR loci. A preliminary implementation of the algorithm shows good results and detects CRISPR repeats in cases where other existing tools fail to do so.
引用
收藏
页码:461 / 471
页数:11
相关论文
共 15 条
[1]   CRISPR Recognition Tool (CRT): a tool for automatic detection of clustered regularly interspaced palindromic repeats [J].
Bland, Charles ;
Ramsey, Teresa L. ;
Sabree, Fareedah ;
Lowe, Micheal ;
Brown, Kyndall ;
Kyrpides, Nikos C. ;
Hugenholtz, Philip .
BMC BIOINFORMATICS, 2007, 8 (1)
[2]   PILER-CR: Fast and accurate identification of CRISPR repeats [J].
Edgar, Robert C. .
BMC BIOINFORMATICS, 2007, 8 (1)
[3]   The CRISPRdb database and tools to display CRISPRs and to generate dictionaries of spacers and repeats [J].
Grissa, Ibtissem ;
Vergnaud, Gilles ;
Pourcel, Christine .
BMC BIOINFORMATICS, 2007, 8 (1)
[4]   CRISPRFinder: a web tool to identify clustered regularly interspaced short palindromic repeats [J].
Grissa, Ibtissem ;
Vergnaud, Gilles ;
Pourcel, Christine .
NUCLEIC ACIDS RESEARCH, 2007, 35 :W52-W57
[5]   CRISPR/Cas, the Immune System of Bacteria and Archaea [J].
Horvath, Philippe ;
Barrangou, Rodolphe .
SCIENCE, 2010, 327 (5962) :167-170
[6]   RNA-directed gene editing specifically eradicates latent and prevents new HIV-1 infection [J].
Hu, Wenhui ;
Kaminski, Rafal ;
Yang, Fan ;
Zhang, Yonggang ;
Cosentino, Laura ;
Li, Fang ;
Luo, Biao ;
Alvarez-Carbonell, David ;
Garcia-Mesa, Yoelvis ;
Karn, Jonathan ;
Mo, Xianming ;
Khalili, Kamel .
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 2014, 111 (31) :11461-11466
[7]   NUCLEOTIDE-SEQUENCE OF THE IAP GENE, RESPONSIBLE FOR ALKALINE-PHOSPHATASE ISOZYME CONVERSION IN ESCHERICHIA-COLI, AND IDENTIFICATION OF THE GENE-PRODUCT [J].
ISHINO, Y ;
SHINAGAWA, H ;
MAKINO, K ;
AMEMURA, M ;
NAKATA, A .
JOURNAL OF BACTERIOLOGY, 1987, 169 (12) :5429-5433
[8]  
Jia H, 2014, PLOS ONE, V9, DOI [10.1371/journal.pone.0093806, 10.1371/journal.pone.0086362]
[9]   Intervening sequences of regularly spaced prokaryotic repeats derive from foreign genetic elements [J].
Mojica, FJM ;
Díez-Villaseñor, C ;
García-Martínez, J ;
Soria, E .
JOURNAL OF MOLECULAR EVOLUTION, 2005, 60 (02) :174-182
[10]  
Myers E W, 1995, J Comput Biol, V2, P275, DOI 10.1089/cmb.1995.2.275