Sequence Pattern Mining based on Markov Chain

被引:3
作者
Zhang Junyan [1 ]
Yang Chenhui [1 ]
机构
[1] Chengdu Univ, Coll Comp Sci, Chengdu Univ Key Lab Pattern Recognit & Intellige, Chengdu, Peoples R China
来源
2015 7TH INTERNATIONAL CONFERENCE ON INFORMATION TECHNOLOGY IN MEDICINE AND EDUCATION (ITME) | 2015年
关键词
sequence pattern mining; Markov chain; transition probabilities Mattrix; minimum support degree;
D O I
10.1109/ITME.2015.49
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Sequence pattern mining is one of the main challenges in data mining and especially in large biological sequence databases, which consist of a large number of DNA sequences. Many existing methods are time consuming and scan the database multiple times. In order to overcome such shortcomings, we propose a fast and efficient algorithm SPMM based on Markov chain for mining sequence patterns because the DNA sequences meet Markov property. We first present the relative concepts and definitions. And then SPMM algorithm is put forward in which transition probabilities matrix is computed for each DNA sequence. The sequence patterns can be identified according to the given threshold of minimum support degree. Some examples are given to illustrate SPMM in detail. The experimental results show that our SPMM algorithm can achieve not only faster speed, but also higher quality results as compared with other algorithms.
引用
收藏
页码:234 / 238
页数:5
相关论文
共 13 条
  • [1] Han J., 2008, DATA MING COMCEPTS T, P326
  • [2] From sequential pattern mining to structured pattern mining: A pattern-growth approach
    Han, JW
    Pei, J
    Yan, XF
    [J]. JOURNAL OF COMPUTER SCIENCE AND TECHNOLOGY, 2004, 19 (03) : 257 - 279
  • [3] He D, 2009, PROC INT C TOOLS ART, P17, DOI 10.1109/ICTAI.2009.8
  • [4] Efficient algorithms for mining maximal frequent concatenate sequences in biological datasets
    Pan, J
    Wang, P
    Wang, W
    Shi, B
    Yang, GX
    [J]. FIFTH INTERNATIONAL CONFERENCE ON COMPUTER AND INFORMATION TECHNOLOGY - PROCEEDINGS, 2005, : 98 - 104
  • [5] Pei J., 2001, P INT C DAT ENG ICDE, P313
  • [6] Pragarauskaite J, 2013, INFORMATICA-LITHUAN, V24, P87
  • [7] Qingda Zhou, 2010, 2010 5th International Conference on Computer Science & Education (ICCSE 2010), P1876, DOI 10.1109/ICCSE.2010.5593815
  • [8] Srikant R., 2010, P 15 INT C EXT DAT T, P3
  • [9] Xiong Yun, 2007, 2007 1st International Conference on Bioinformatics and Biomedical Engineering, P394
  • [10] SPADE: An efficient algorithm for mining frequent sequences
    Zaki, MJ
    [J]. MACHINE LEARNING, 2001, 42 (1-2) : 31 - 60