Protein Classification using Hidden Markov Models and Randomised Decision Trees

被引:0
作者
Lacey, Arron [1 ]
Deng, Jingjing [1 ]
Xie, Xianghua [1 ]
机构
[1] Swansea Univ, Dept Comp Sci, Swansea SA2 8PP, W Glam, Wales
来源
2014 7TH INTERNATIONAL CONFERENCE ON BIOMEDICAL ENGINEERING AND INFORMATICS (BMEI 2014) | 2014年
关键词
D O I
暂无
中图分类号
R318 [生物医学工程];
学科分类号
0831 ;
摘要
Since the introduction of next generation sequencing there is a demand for sophisticated methods to classify proteins based on sequence data. Two main approaches for this task are to use the raw sequence data and align them against other sequences, or to extract discrete high level features from the protein sequences and compare the features. Two machine learning methods are demonstrated to show each approach. Profile Hidden Markov Models are built from multiple alignment of raw sequence data and learn amino acid emission and transition parameters for a given alignment and effectively harness the power of aligning a test protein to a model built form many proteins. Random Forests on the other hand are used to discriminate between two sets of proteins based on features such as functional amino acid groups and physiochemical properties extracted from the raw sequences. The strengths and limitations of each method are presented and discussed, focussing on the individual merits and how they could work possibly compliment each other rather than just being compared by their classification accuracy.
引用
收藏
页码:659 / 664
页数:6
相关论文
共 16 条
  • [1] Altschul SF, 1996, METHOD ENZYMOL, V266, P460
  • [2] Helix packing in membrane proteins
    Bowie, JU
    [J]. JOURNAL OF MOLECULAR BIOLOGY, 1997, 272 (05) : 780 - 789
  • [3] Antifreeze proteins
    Davies, PL
    Sykes, BD
    [J]. CURRENT OPINION IN STRUCTURAL BIOLOGY, 1997, 7 (06) : 828 - 834
  • [4] Profile hidden Markov models
    Eddy, SR
    [J]. BIOINFORMATICS, 1998, 14 (09) : 755 - 763
  • [5] Pfam: the protein families database
    Finn, Robert D.
    Bateman, Alex
    Clements, Jody
    Coggill, Penelope
    Eberhardt, Ruth Y.
    Eddy, Sean R.
    Heger, Andreas
    Hetherington, Kirstie
    Holm, Liisa
    Mistry, Jaina
    Sonnhammer, Erik L. L.
    Tate, John
    Punta, Marco
    [J]. NUCLEIC ACIDS RESEARCH, 2014, 42 (D1) : D222 - D230
  • [6] Scores for sequence searches and alignments
    Henikoff, S
    [J]. CURRENT OPINION IN STRUCTURAL BIOLOGY, 1996, 6 (03) : 353 - 360
  • [7] AFP-Pred: A random forest approach for predicting antifreeze proteins from sequence-derived properties
    Kandaswamy, Krishna Kumar
    Chou, Kuo-Chen
    Martinetz, Thomas
    Moeller, Steffen
    Suganthan, P. N.
    Sridharan, S.
    Pugalenthi, Ganesan
    [J]. JOURNAL OF THEORETICAL BIOLOGY, 2011, 270 (01) : 56 - 62
  • [8] KEGG: Kyoto Encyclopedia of Genes and Genomes
    Kanehisa, M
    Goto, S
    [J]. NUCLEIC ACIDS RESEARCH, 2000, 28 (01) : 27 - 30
  • [9] AAindex: amino acid index database, progress report 2008
    Kawashima, Shuichi
    Pokarowski, Piotr
    Pokarowska, Maria
    Kolinski, Andrzej
    Katayama, Toshiaki
    Kanehisa, Minoru
    [J]. NUCLEIC ACIDS RESEARCH, 2008, 36 : D202 - D205
  • [10] A SIMPLE METHOD FOR DISPLAYING THE HYDROPATHIC CHARACTER OF A PROTEIN
    KYTE, J
    DOOLITTLE, RF
    [J]. JOURNAL OF MOLECULAR BIOLOGY, 1982, 157 (01) : 105 - 132