Protein Classification using Hidden Markov Models and Randomised Decision Trees

被引：0

作者：

Lacey, Arron ^{[1
]}

Deng, Jingjing ^{[1
]}

Xie, Xianghua ^{[1
]}

机构：

[1] Swansea Univ, Dept Comp Sci, Swansea SA2 8PP, W Glam, Wales

来源：

2014 7TH INTERNATIONAL CONFERENCE ON BIOMEDICAL ENGINEERING AND INFORMATICS (BMEI 2014) | 2014年

关键词：

D O I：

暂无

中图分类号：

R318 [生物医学工程];

学科分类号：

0831 ;

摘要：

Since the introduction of next generation sequencing there is a demand for sophisticated methods to classify proteins based on sequence data. Two main approaches for this task are to use the raw sequence data and align them against other sequences, or to extract discrete high level features from the protein sequences and compare the features. Two machine learning methods are demonstrated to show each approach. Profile Hidden Markov Models are built from multiple alignment of raw sequence data and learn amino acid emission and transition parameters for a given alignment and effectively harness the power of aligning a test protein to a model built form many proteins. Random Forests on the other hand are used to discriminate between two sets of proteins based on features such as functional amino acid groups and physiochemical properties extracted from the raw sequences. The strengths and limitations of each method are presented and discussed, focussing on the individual merits and how they could work possibly compliment each other rather than just being compared by their classification accuracy.

引用

页码：659 / 664

页数：6

共 16 条

[1] Altschul SF, 1996, METHOD ENZYMOL, V266, P460
[2] Helix packing in membrane proteins
Bowie, JU
[J]. JOURNAL OF MOLECULAR BIOLOGY, 1997, 272 (05) : 780 - 789
[3] Antifreeze proteins
Davies, PL
Sykes, BD
[J]. CURRENT OPINION IN STRUCTURAL BIOLOGY, 1997, 7 (06) : 828 - 834
[4] Profile hidden Markov models
Eddy, SR
[J]. BIOINFORMATICS, 1998, 14 (09) : 755 - 763
[5] Pfam: the protein families database
Finn, Robert D.
Bateman, Alex
Clements, Jody
Coggill, Penelope
Eberhardt, Ruth Y.
Eddy, Sean R.
Heger, Andreas
Hetherington, Kirstie
Holm, Liisa
Mistry, Jaina
Sonnhammer, Erik L. L.
Tate, John
Punta, Marco
[J]. NUCLEIC ACIDS RESEARCH, 2014, 42 (D1) : D222 - D230
[6] Scores for sequence searches and alignments
Henikoff, S
[J]. CURRENT OPINION IN STRUCTURAL BIOLOGY, 1996, 6 (03) : 353 - 360
[7] AFP-Pred: A random forest approach for predicting antifreeze proteins from sequence-derived properties
Kandaswamy, Krishna Kumar
Chou, Kuo-Chen
Martinetz, Thomas
Moeller, Steffen
Suganthan, P. N.
Sridharan, S.
Pugalenthi, Ganesan
[J]. JOURNAL OF THEORETICAL BIOLOGY, 2011, 270 (01) : 56 - 62
[8] KEGG: Kyoto Encyclopedia of Genes and Genomes
Kanehisa, M
Goto, S
[J]. NUCLEIC ACIDS RESEARCH, 2000, 28 (01) : 27 - 30
[9] AAindex: amino acid index database, progress report 2008
Kawashima, Shuichi
Pokarowski, Piotr
Pokarowska, Maria
Kolinski, Andrzej
Katayama, Toshiaki
Kanehisa, Minoru
[J]. NUCLEIC ACIDS RESEARCH, 2008, 36 : D202 - D205
[10] A SIMPLE METHOD FOR DISPLAYING THE HYDROPATHIC CHARACTER OF A PROTEIN
KYTE, J
DOOLITTLE, RF
[J]. JOURNAL OF MOLECULAR BIOLOGY, 1982, 157 (01) : 105 - 132

← 1 2 →