Pairwise Protein Substring Alignment with Latent Semantic Analysis and Support Vector Machines to Detect Remote Protein Homology

被引:0
作者
Ismail, Surayati [1 ]
Othman, Razib M. [1 ]
Kasim, Shahreen [2 ]
机构
[1] Univ Teknologi Malaysia, Lab Computat Intelligence & Biotechnol, Utm Skudai 81310, Malaysia
[2] Univ Tun Hussein Onn Malaysia, Fac Comp Sci & Informat Technol, Dept Web Technol, Batu Pahat 86400, Malaysia
来源
UBIQUITOUS COMPUTING AND MULTIMEDIA APPLICATIONS, PT II | 2011年 / 151卷
关键词
Remote Protein Homology Detection; Protein Substring Scoring; Pairwise Protein Substring Alignment; Latent Semantic Analysis; Support Vector Machines; PREDICTION; SEQUENCES; ALGORITHM;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Remote protein homology detection has been widely used as a part of the analysis of protein structure and function. In this study, the good quality of protein feature vectors is the main aspect to detect remote protein homology; as it will assist discriminative classifier model to discriminate all the proteins into homologue or non-homologue members precisely. In order for the protein feature vectors to be characterized as having good quality, the feature vectors must contain high protein structural similarity information and are represented in low dimension which is free from any contaminated data. In this study, the contaminated data which originates from protein dataset was investigated. This contaminated data may prevent remote protein homology detection framework to produce the best representation of high protein structural similarity information in order to detect the homology of proteins. To reduce the contaminated data and extract high protein structural similarity information, some research has been done on the extraction of protein feature vectors and protein similarity. The extraction of protein feature vectors of good quality is believed could assist in getting better result for remote protein homology detection. Where, the good quality of protein feature vectors containing the useful protein similarity information and represent in low dimension will be used to identify protein family precisely by discriminative classifier model. Referring to this factor, a method which combines Protein Substring Scoring (PSS) and Pairwise Protein Substring Alignment (PPSA) from sequence comparison model, chi-square and Singular Value Decomposition (SVD) from generative model, and Support Vector Machine (SVM) as discriminative classifier model is introduced.
引用
收藏
页码:526 / +
页数:3
相关论文
共 23 条
[1]   BASIC LOCAL ALIGNMENT SEARCH TOOL [J].
ALTSCHUL, SF ;
GISH, W ;
MILLER, W ;
MYERS, EW ;
LIPMAN, DJ .
JOURNAL OF MOLECULAR BIOLOGY, 1990, 215 (03) :403-410
[2]  
[Anonymous], 1997, ICML
[3]   A tutorial on Support Vector Machines for pattern recognition [J].
Burges, CJC .
DATA MINING AND KNOWLEDGE DISCOVERY, 1998, 2 (02) :121-167
[4]   Support Vector Machines for predicting protein structural class [J].
Cai, Yu-Dong ;
Liu, Xiao-Jun ;
Xu, Xue-biao ;
Zhou, Guo-Ping .
BMC BIOINFORMATICS, 2001, 2 (1)
[5]   Structural bioinformatics and its impact to biomedical science [J].
Chou, KC .
CURRENT MEDICINAL CHEMISTRY, 2004, 11 (16) :2105-2134
[6]  
Chou KC, 1999, PROTEINS, V34, P137, DOI 10.1002/(SICI)1097-0134(19990101)34:1<137::AID-PROT11>3.0.CO
[7]  
2-O
[8]   Predicting protein subcellular location by fusing multiple classifiers [J].
Chou, Kuo-Chen ;
Shen, Hong-Bin .
JOURNAL OF CELLULAR BIOCHEMISTRY, 2006, 99 (02) :517-527
[9]  
Dong QW, 2005, PROCEEDINGS OF 2005 INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND CYBERNETICS, VOLS 1-9, P3363
[10]   Application of latent semantic analysis to protein remote homology detection [J].
Dong, QW ;
Wang, XL ;
Lin, L .
BIOINFORMATICS, 2006, 22 (03) :285-290