SEQUENCE-BASED ENZYME CATALYTIC DOMAIN PREDICTION USING CLUSTERING AND AGGREGATED MUTUAL INFORMATION CONTENT

被引：6

作者：

Choi, Kwangmin ^{[1
]}

Kim, Sun ^{[2
,3
]}

机构：

[1] Cincinnati Childrens Hosp Med Ctr, Div Expt Hematol & Canc Biol, Cincinnati, OH 45229 USA

[2] Seoul Natl Univ, Sch Comp Sci & Engn, Seoul 151744, South Korea

[3] Seoul Natl Univ, Interdisciplinary Program Bioinformat, Seoul 151744, South Korea

来源：

JOURNAL OF BIOINFORMATICS AND COMPUTATIONAL BIOLOGY | 2011年 / 9卷 / 05期

关键词：

Enzyme; functional domains; active site; clustering; aggregated mutual information content; RESIDUES; SITES;

D O I：

10.1142/S0219720011005677

中图分类号：

Q5 [生物化学];

学科分类号：

071010 ; 081704 ;

摘要：

Characterizing enzyme sequences and identifying their active sites is a very important task. The current experimental methods are too expensive and labor intensive to handle the rapidly accumulating protein sequences and structure data. Thus accurate, high-throughput in silico methods for identifying catalytic residues and enzyme function prediction are much needed. In this paper, we propose a novel sequence-based catalytic domain prediction method using a sequence clustering and an information-theoretic approaches. The first step is to perform the sequence clustering analysis of enzyme sequences from the same functional category (those with the same EC label). The clustering analysis is used to handle the problem of widely varying sequence similarity levels in enzyme sequences. The clustering analysis constructs a sequence graph where nodes are enzyme sequences and edges are a pair of sequences with a certain degree of sequence similarity, and uses graph properties, such as biconnected components and articulation points, to generate sequence segments common to the enzyme sequences. Then amino acid subsequences in the common shared regions are aligned and then an information theoretic approach called aggregated column related scoring scheme is performed to highlight potential active sites in enzyme sequences. The aggregated information content scoring scheme is shown to be effective to highlight residues of active sites effectively. The proposed method of combining the clustering and the aggregated information content scoring methods was successful in highlighting known catalytic sites in enzymes of Escherichia coli K12 in terms of the Catalytic Site Atlas database. Our method is shown to be not only accurate in predicting potential active sites in the enzyme sequences but also computationally efficient since the clustering approach utilizes two graph properties that can be computed in linear to the number of edges in the sequence graph and computation of mutual information does not require much time. We believe that the proposed method can be useful for identifying active sites of enzyme sequences from many genome projects.

引用

页码：597 / 611

页数：15

共 50 条

[1] PINGU: PredIction of eNzyme catalytic residues usinG seqUence information
Pai, Priyadarshini P.
Ranjani, S. S. Shree
Mondal, Sukanta
PLOS ONE, 2015, 10 (08):
[2] Accurate sequence-based prediction of catalytic residues
Zhang, Tuo
Zhang, Hua
Chen, Ke
Shen, Shiyi
Ruan, Jishou
Kurgan, Lukasz
BIOINFORMATICS, 2008, 24 (20) : 2329 - 2338
[3] De novo sequence-based method for ncRPI prediction using structural information
Leone, Michele
Galvani, Marta
Masseroli, Marco
2019 IEEE 19TH INTERNATIONAL CONFERENCE ON BIOINFORMATICS AND BIOENGINEERING (BIBE), 2019, : 146 - 151
[4] Sequence-Based Prediction of Enzyme Thermostability Through Bioinformatics Algorithms
Ebrahimi, Mansour
Ebrahimie, Esmaeil
CURRENT BIOINFORMATICS, 2010, 5 (03) : 195 - 203
[5] DEEPre: sequence-based enzyme EC number prediction by deep learning
Li, Yu
Wang, Sheng
Umarov, Ramzan
Xie, Bingqing
Fan, Ming
Li, Lihua
Gao, Xin
BIOINFORMATICS, 2018, 34 (05) : 760 - 769
[6] Sequence-based information-theoretic features for gene essentiality prediction
Dawit Nigatu
Patrick Sobetzko
Malik Yousef
Werner Henkel
BMC Bioinformatics, 18
[7] Sequence-based information-theoretic features for gene essentiality prediction
Nigatu, Dawit
Sobetzko, Patrick
Yousef, Malik
Henkel, Werner
BMC BIOINFORMATICS, 2017, 18
[8] A mutual information based face clustering algorithm for movie content analysis
Vretos, N.
Solachidis, V.
Pitas, I.
IMAGE AND VISION COMPUTING, 2011, 29 (10) : 693 - 705
[9] Federated Learning Based on Mutual Information Clustering for Wireless Traffic Prediction
Zhang, Jianwei
Hu, Xinhua
Cai, Zengyu
Zhu, Liang
Feng, Yuan
ELECTRONICS, 2023, 12 (21)
[10] Sequence-based protein domain boundary prediction using BP neural network with various property profiles
Ye, Lei
Liu, Ting
Wu, Zhaohui
Zhou, Ruhong
PROTEINS-STRUCTURE FUNCTION AND BIOINFORMATICS, 2008, 71 (01) : 300 - 307

← 1 2 3 4 5 →