IUPACpal: efficient identification of inverted repeats in IUPAC-encoded DNA sequences

被引:8
作者
Alamro, Hayam [1 ,2 ]
Alzamel, Mai [1 ,3 ]
Iliopoulos, Costas S. [1 ]
Pissis, Solon P. [4 ,5 ]
Watts, Steven [1 ]
机构
[1] Kings Coll London, Dept Informat, 30 Aldwych, London, England
[2] Princess Nourah bint Abdulrahman Univ, Dept Informat Syst, Riyadh, Saudi Arabia
[3] King Saud Univ, Comp Sci Dept, Riyadh, Saudi Arabia
[4] Ctr Wiskunde & Informat, Amsterdam, Netherlands
[5] Vrije Univ Amsterdam, Amsterdam, Netherlands
基金
欧盟地平线“2020”; 英国工程与自然科学研究理事会;
关键词
Inverted repeat; Palindrome; Gaps; Mismatches; Software; IUPAC; CHROMOSOME; REGION; CRUCIFORM; XQ13;
D O I
10.1186/s12859-021-03983-2
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Background An inverted repeat is a DNA sequence followed downstream by its reverse complement, potentially with a gap in the centre. Inverted repeats are found in both prokaryotic and eukaryotic genomes and they have been linked with countless possible functions. Many international consortia provide a comprehensive description of common genetic variation making alternative sequence representations, such as IUPAC encoding, necessary for leveraging the full potential of such broad variation datasets. Results We present IUPACpal, an exact tool for efficient identification of inverted repeats in IUPAC-encoded DNA sequences allowing also for potential mismatches and gaps in the inverted repeats. Conclusion Within the parameters that were tested, our experimental results show that IUPACpal compares favourably to a similar application packaged with EMBOSS. We show that IUPACpal identifies many previously unidentified inverted repeats when compared with EMBOSS, and that this is also performed with orders of magnitude improved speed.
引用
收藏
页数:12
相关论文
共 27 条
[1]   Extrusion of an imperfect palindrome to a cruciform in superhelical DNA: Complete determination of energetics using a statistical mechanical model [J].
Benham, CJ ;
Savitt, AG ;
Bauer, WR .
JOURNAL OF MOLECULAR BIOLOGY, 2002, 316 (03) :563-581
[2]  
Benson G., INVERTED REPEATS FIN
[3]  
Brazda V, 2020, GENOMICS
[4]  
Brooks LD, 2015, Nature, V526, P68, DOI [DOI 10.1038/NATURE15393, DOI 10.1038/nature15393]
[5]   SOPanG: online text searching over a pan-genome [J].
Cislak, Aleksander ;
Grabowski, Szymon ;
Holub, Jan .
BIOINFORMATICS, 2018, 34 (24) :4290-4292
[6]  
Crochemore M., 2007, ALGORITHMS STRINGS, DOI [DOI 10.1017/CBO9780511546853, 10.1017/cbo9780511546853]
[7]   Divergent distributions of inverted repeats and G-quadruplex forming sequences in Saccharomyces cerevisiae [J].
Cutova, Michaela ;
Manta, Jacinta ;
Porubiakova, Otilia ;
Kaura, Patrik ;
St'astny, Jiri ;
Jagelska, Eva B. ;
Goswami, Pratik ;
Bartas, Martin ;
Brazda, Vaclav .
GENOMICS, 2020, 112 (02) :1897-1901
[8]  
Galil Z., 1986, SIGACT News, V17, P52, DOI 10.1145/8307.8309
[9]  
IUPAC-IUB Commission on Biochemical Nomenclature, 1970, Biochemistry, V9, P4022, DOI [10.1016/0022-2836(71)90319-6, DOI 10.1016/0022-2836(71)90319-6]
[10]   Searching for gapped palindromes [J].
Kolpakov, Roman ;
Kucherov, Gregory .
THEORETICAL COMPUTER SCIENCE, 2009, 410 (51) :5365-5373