A Grammatical Inference Sequential Mining Algorithm for Protein Fold Recognition

被引:0
作者
Soliman, Taysir Hassan A. [1 ]
Eldin, Ahmed Sharaf [2 ]
Ghareeb, Marwa M. [3 ]
Marie, Mohammed E. [2 ]
机构
[1] Assiut Univ, Fac Comp & Informat, Informat Syst Dept, Assiut, Egypt
[2] Helwan Univ, Fac Comp & Informat, Informat Syst Dept, Assiut, Egypt
[3] Modern Acad, Fac Comp & Informat, Informat Syst Dept, Assiut, Egypt
关键词
Data mining; grammatical inference; sequential mining; protein fold recognition;
D O I
暂无
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
Protein fold recognition plays an important role in computational protein analysis since it can determine protein function whose structure is unknown. In this paper, a Classified Sequential Pattern mining technique for Protein Fold Recognition (CSPF) is proposed. CSPF technique consists of two main phases: the sequential mining pattern phase and the fold recognition phase. In the sequential mining pattern phase, Mix & Test algorithm is developed based on Grammatical Inference, which is used as a training phase. Mix & Test algorithm minimizes I/O costs by one database scan, discovers subsequence combinations directly from sequences in memory without searching the whole sequence file, has no database projection, handles gaps, and works with variant length sequences without having to align them. In addition, a parallelized version of Mix & Test algorithm is applied to speed up Mix & Test algorithm performance. In the fold recognition phase, unknown protein folds are predicted via a proposed testing function. To test the performance, 36 SCOP protein folds are used, where the accuracy rate is 75.84% for training data and 59.7% for testing data.
引用
收藏
页码:97 / 106
页数:10
相关论文
共 25 条
  • [1] AGRAWAL R, 1995, PROC INT CONF DATA, P3, DOI 10.1109/ICDE.1995.380415
  • [2] Alione N., 2006, J PARALLEL DISTRIBUT, V66, P489
  • [3] Brazma A., 2000, LNCS, V1433, P257
  • [4] Carpio C., 1995, P WORKSH GEN INF U A
  • [5] Chmielnicki W, 2010, LECT NOTES ARTIF INT, V6076, P162, DOI 10.1007/978-3-642-13769-3_20
  • [6] Eldin A. Sharaf, 2013, INT J SCI, V2, P24
  • [7] Ester M, 2004, SIAM PROC S, P90
  • [8] Fold recognition by combining profile-profile alignment and support vector machine
    Han, SJ
    Lee, BC
    Yu, ST
    Jeong, CS
    Lee, S
    Kim, D
    [J]. BIOINFORMATICS, 2005, 21 (11) : 2667 - 2673
  • [9] Islam R., 2005, P 25 INT C CHIL COMP, P2347
  • [10] Jiawei Han, 2000, Proceedings. KDD-2000. Sixth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, P355