A Fast longest crossing-plain preserving common subsequence algorithm

被引:0
作者
Kenawy T.G. [1 ]
Abdel-Rahman M.H. [1 ]
Bahig H.M. [1 ]
机构
[1] Computer Science Division, Mathematics Department, Faculty of Science, Ain Shams University, Cairo
关键词
Arc-preserving common subsequence; Dynamic programming; Longest common subsequence; Pattern matching; RNA structure;
D O I
10.1007/s41870-022-01038-0
中图分类号
学科分类号
摘要
In molecular biology, the comparison of ribonucleic acid (RNA) structures is important in predicting RNA folding and exploring RNA conserved and consensus structures. One of the frameworks used to study the RNA structure comparison is the longest arc-preserving common subsequence (LAPCS). In this framework, given two RNA structures represented as two arc-annotated sequences, (X, PX) and (Y, PY) , where (1) X and Y are the sequences of nucleotides over Σ = { A , C , G , U } and (2) PX and PY are the set of arcs that represent the chemical bonds between complementary bases. The goal of the LAPCS problem is to determine the maximal length of the common subsequence of X and Y preserving all the arcs that link a subsequence of nucleotides. In this paper, we developed a fast heuristic algorithm for LAPCS in which the two types of arc-annotated sequence are CROSSING and PLAIN. The algorithm is based on dynamic programming and a lookup table, and runs in O(nm) , where n and m are the lengths of the sequences. The experiments on artificial data, consisting of different sequence lengths and number of arcs showed that the proposed algorithm outperformed the fastest known heuristic algorithm for the LAPCS problem in terms of both time complexity and output length. Specifically, the proposed algorithm achieved, on average, a 96% improvement in running time and was able to generate a LAPCS that was better in terms of length. © 2022, The Author(s), under exclusive licence to Bharati Vidyapeeth's Institute of Computer Applications and Management.
引用
收藏
页码:3019 / 3029
页数:10
相关论文
共 30 条
[1]  
Sekhar S., Siddesh G., Raj M., Manvi S.S., Protein class prediction based on count vectorizer and long short term memory, Int J Inf Technol, 13, 1, pp. 341-348, (2021)
[2]  
Sasikala S., Ratha Jeyalakshmi T., GSCNN: a composition of CNN and Gibb Sampling computational strategy for predicting promoter in bacterial genomes, Int. j. inf. tecnol, 13, 2, pp. 493-499, (2021)
[3]  
Abbass M.M., Bahig H.M., An efficient algorithm to identify DNA motifs, Math Comput Sci, 7, 4, pp. 387-399, (2013)
[4]  
Abbass M.M., Bahig H.M., Abouelhoda M., Mohie-Eldin M., Parallelizing exact motif finding algorithms on multi-core, J Supercomput, 69, 2, pp. 814-826, (2014)
[5]  
Li Q., Zhang L., Xu L., Zou Q., Wu J., Li Q., Identification and classification of promoters using the attention mechanism based on long short-term memory, Front Comput Sci, 16, 4, (2022)
[6]  
Abbas M.M., Abouelhoda M., Bahig H.M., A hybrid method for the exact planted (l, d) motif finding problem and its parallelization, BMC Bioinformatics, 13, 17, (2012)
[7]  
Abbas M.M., Bahig H.M., A fast exact sequential algorithm for the partial digest problem, BMC Bioinformatics, 17, 19, (2016)
[8]  
Abbass M.M., Bahig H.M., Mohie-Eldin M., Parallelizing partial digest problem on multicore system, In International Symposium on Bioinformatics Research and Applications, pp. 174-178, (2017)
[9]  
Bahig H.M., Abbas M., A scalable parallel algorithm for turnpike problem, J Egyptian Math Soc, 26, 1, pp. 18-26, (2018)
[10]  
Zhuozhi W., Kaizhong Z., RNA secondary structure prediction, Current Topics in Computational Molecular Biology, pp. 345-364, (2002)