Motivation: Protein fold recognition is an important problem in structural bioinformatics. Almost all traditional fold recognition methods use sequence (homology) comparison to indirectly predict the fold of a target protein based on the fold of a template protein with known structure, which cannot explain the relationship between sequence and fold. Only a few methods had been developed to classify protein sequences into a small number of folds due to methodological limitations, which are not generally useful in practice. Results: We develop a deep 1D-convolution neural network (DeepSF) to directly classify any protein sequence into one of 1195 known folds, which is useful for both fold recognition and the study of sequence-structure relationship. Different from traditional sequence alignment (comparison) based methods, our method automatically extracts fold-related features from a protein sequence of any length and maps it to the fold space. We train and test our method on the datasets curated from SCOP1.75, yielding an average classification accuracy of 75.3%. On the independent testing dataset curated from SCOP2.06, the classification accuracy is 73.0%. We compare our method with a top profile-profile alignment method-HHSearch on hard template-based and template-free modeling targets of CASP9-12 in terms of fold recognition accuracy. The accuracy of our method is 12.63-26.32% higher than HHSearch on template-free modeling targets and 3.39-17.09% higher on hard template-based modeling targets for top 1, 5 and 10 predicted folds. The hidden features extracted from sequence by our method is robust against sequence mutation, insertion, deletion and truncation, and can be used for other protein pattern recognition problems such as protein clustering, comparison and ranking.
机构:
Lawrence Berkeley Natl Lab, Environm Genom & Syst Biol Div, Berkeley, CA 94720 USA
Lawrence Berkeley Natl Lab, Mol Biophys & Integrated Bioimaging Div, Berkeley, CA 94720 USALawrence Berkeley Natl Lab, Environm Genom & Syst Biol Div, Berkeley, CA 94720 USA
Chandonia, John-Marc
;
Fox, Naomi K.
论文数: 0引用数: 0
h-index: 0
机构:
Lawrence Berkeley Natl Lab, Mol Biophys & Integrated Bioimaging Div, Berkeley, CA 94720 USA
Invitae, 458 Brannan St, San Francisco, CA 94107 USALawrence Berkeley Natl Lab, Environm Genom & Syst Biol Div, Berkeley, CA 94720 USA
Fox, Naomi K.
;
Brenner, Steven E.
论文数: 0引用数: 0
h-index: 0
机构:
Lawrence Berkeley Natl Lab, Environm Genom & Syst Biol Div, Berkeley, CA 94720 USA
Univ Calif Berkeley, Dept Plant & Microbial Biol, 461A Koshland Hall, Berkeley, CA 94720 USALawrence Berkeley Natl Lab, Environm Genom & Syst Biol Div, Berkeley, CA 94720 USA
机构:
Univ Texas SW Med Ctr Dallas, Howard Hughes Med Inst, Dallas, TX 75390 USAUniv Texas SW Med Ctr Dallas, Howard Hughes Med Inst, Dallas, TX 75390 USA
Cheng, Hua
;
Schaeffer, R. Dustin
论文数: 0引用数: 0
h-index: 0
机构:
Univ Texas SW Med Ctr Dallas, Howard Hughes Med Inst, Dallas, TX 75390 USAUniv Texas SW Med Ctr Dallas, Howard Hughes Med Inst, Dallas, TX 75390 USA
Schaeffer, R. Dustin
;
Liao, Yuxing
论文数: 0引用数: 0
h-index: 0
机构:
Univ Texas SW Med Ctr Dallas, Dept Biophys & Biochem, Dallas, TX 75390 USAUniv Texas SW Med Ctr Dallas, Howard Hughes Med Inst, Dallas, TX 75390 USA
Liao, Yuxing
;
Kinch, Lisa N.
论文数: 0引用数: 0
h-index: 0
机构:
Univ Texas SW Med Ctr Dallas, Howard Hughes Med Inst, Dallas, TX 75390 USAUniv Texas SW Med Ctr Dallas, Howard Hughes Med Inst, Dallas, TX 75390 USA
Kinch, Lisa N.
;
Pei, Jimin
论文数: 0引用数: 0
h-index: 0
机构:
Univ Texas SW Med Ctr Dallas, Howard Hughes Med Inst, Dallas, TX 75390 USAUniv Texas SW Med Ctr Dallas, Howard Hughes Med Inst, Dallas, TX 75390 USA
Pei, Jimin
;
Shi, Shuoyong
论文数: 0引用数: 0
h-index: 0
机构:
Univ Texas SW Med Ctr Dallas, Dept Biophys & Biochem, Dallas, TX 75390 USAUniv Texas SW Med Ctr Dallas, Howard Hughes Med Inst, Dallas, TX 75390 USA
Shi, Shuoyong
;
Kim, Bong-Hyun
论文数: 0引用数: 0
h-index: 0
机构:
Univ Texas SW Med Ctr Dallas, Dept Biophys & Biochem, Dallas, TX 75390 USAUniv Texas SW Med Ctr Dallas, Howard Hughes Med Inst, Dallas, TX 75390 USA
Kim, Bong-Hyun
;
Grishin, Nick V.
论文数: 0引用数: 0
h-index: 0
机构:
Univ Texas SW Med Ctr Dallas, Howard Hughes Med Inst, Dallas, TX 75390 USA
Univ Texas SW Med Ctr Dallas, Dept Biophys & Biochem, Dallas, TX 75390 USAUniv Texas SW Med Ctr Dallas, Howard Hughes Med Inst, Dallas, TX 75390 USA
机构:
KAUST, CBRC, CEMSE Div, Thuwal 239556900, Saudi ArabiaKAUST, CBRC, CEMSE Div, Thuwal 239556900, Saudi Arabia
Cui, Xuefeng
;
Lu, Zhiwu
论文数: 0引用数: 0
h-index: 0
机构:
Renmin Univ China, Sch Informat, Beijing Key Lab Big Data Management & Anal Method, Beijing 100872, Peoples R ChinaKAUST, CBRC, CEMSE Div, Thuwal 239556900, Saudi Arabia
Lu, Zhiwu
;
Wang, Sheng
论文数: 0引用数: 0
h-index: 0
机构:
Toyota Technol Inst Chicago, 6045 Kenwood Ave, Chicago, IL 60637 USA
Univ Chicago, Dept Human Genet, E 58th St, Chicago, IL 60637 USAKAUST, CBRC, CEMSE Div, Thuwal 239556900, Saudi Arabia
Wang, Sheng
;
Wang, Jim Jing-Yan
论文数: 0引用数: 0
h-index: 0
机构:
KAUST, CBRC, CEMSE Div, Thuwal 239556900, Saudi ArabiaKAUST, CBRC, CEMSE Div, Thuwal 239556900, Saudi Arabia
Wang, Jim Jing-Yan
;
Gao, Xin
论文数: 0引用数: 0
h-index: 0
机构:
KAUST, CBRC, CEMSE Div, Thuwal 239556900, Saudi ArabiaKAUST, CBRC, CEMSE Div, Thuwal 239556900, Saudi Arabia
机构:
Lawrence Berkeley Natl Lab, Environm Genom & Syst Biol Div, Berkeley, CA 94720 USA
Lawrence Berkeley Natl Lab, Mol Biophys & Integrated Bioimaging Div, Berkeley, CA 94720 USALawrence Berkeley Natl Lab, Environm Genom & Syst Biol Div, Berkeley, CA 94720 USA
Chandonia, John-Marc
;
Fox, Naomi K.
论文数: 0引用数: 0
h-index: 0
机构:
Lawrence Berkeley Natl Lab, Mol Biophys & Integrated Bioimaging Div, Berkeley, CA 94720 USA
Invitae, 458 Brannan St, San Francisco, CA 94107 USALawrence Berkeley Natl Lab, Environm Genom & Syst Biol Div, Berkeley, CA 94720 USA
Fox, Naomi K.
;
Brenner, Steven E.
论文数: 0引用数: 0
h-index: 0
机构:
Lawrence Berkeley Natl Lab, Environm Genom & Syst Biol Div, Berkeley, CA 94720 USA
Univ Calif Berkeley, Dept Plant & Microbial Biol, 461A Koshland Hall, Berkeley, CA 94720 USALawrence Berkeley Natl Lab, Environm Genom & Syst Biol Div, Berkeley, CA 94720 USA
机构:
Univ Texas SW Med Ctr Dallas, Howard Hughes Med Inst, Dallas, TX 75390 USAUniv Texas SW Med Ctr Dallas, Howard Hughes Med Inst, Dallas, TX 75390 USA
Cheng, Hua
;
Schaeffer, R. Dustin
论文数: 0引用数: 0
h-index: 0
机构:
Univ Texas SW Med Ctr Dallas, Howard Hughes Med Inst, Dallas, TX 75390 USAUniv Texas SW Med Ctr Dallas, Howard Hughes Med Inst, Dallas, TX 75390 USA
Schaeffer, R. Dustin
;
Liao, Yuxing
论文数: 0引用数: 0
h-index: 0
机构:
Univ Texas SW Med Ctr Dallas, Dept Biophys & Biochem, Dallas, TX 75390 USAUniv Texas SW Med Ctr Dallas, Howard Hughes Med Inst, Dallas, TX 75390 USA
Liao, Yuxing
;
Kinch, Lisa N.
论文数: 0引用数: 0
h-index: 0
机构:
Univ Texas SW Med Ctr Dallas, Howard Hughes Med Inst, Dallas, TX 75390 USAUniv Texas SW Med Ctr Dallas, Howard Hughes Med Inst, Dallas, TX 75390 USA
Kinch, Lisa N.
;
Pei, Jimin
论文数: 0引用数: 0
h-index: 0
机构:
Univ Texas SW Med Ctr Dallas, Howard Hughes Med Inst, Dallas, TX 75390 USAUniv Texas SW Med Ctr Dallas, Howard Hughes Med Inst, Dallas, TX 75390 USA
Pei, Jimin
;
Shi, Shuoyong
论文数: 0引用数: 0
h-index: 0
机构:
Univ Texas SW Med Ctr Dallas, Dept Biophys & Biochem, Dallas, TX 75390 USAUniv Texas SW Med Ctr Dallas, Howard Hughes Med Inst, Dallas, TX 75390 USA
Shi, Shuoyong
;
Kim, Bong-Hyun
论文数: 0引用数: 0
h-index: 0
机构:
Univ Texas SW Med Ctr Dallas, Dept Biophys & Biochem, Dallas, TX 75390 USAUniv Texas SW Med Ctr Dallas, Howard Hughes Med Inst, Dallas, TX 75390 USA
Kim, Bong-Hyun
;
Grishin, Nick V.
论文数: 0引用数: 0
h-index: 0
机构:
Univ Texas SW Med Ctr Dallas, Howard Hughes Med Inst, Dallas, TX 75390 USA
Univ Texas SW Med Ctr Dallas, Dept Biophys & Biochem, Dallas, TX 75390 USAUniv Texas SW Med Ctr Dallas, Howard Hughes Med Inst, Dallas, TX 75390 USA
机构:
KAUST, CBRC, CEMSE Div, Thuwal 239556900, Saudi ArabiaKAUST, CBRC, CEMSE Div, Thuwal 239556900, Saudi Arabia
Cui, Xuefeng
;
Lu, Zhiwu
论文数: 0引用数: 0
h-index: 0
机构:
Renmin Univ China, Sch Informat, Beijing Key Lab Big Data Management & Anal Method, Beijing 100872, Peoples R ChinaKAUST, CBRC, CEMSE Div, Thuwal 239556900, Saudi Arabia
Lu, Zhiwu
;
Wang, Sheng
论文数: 0引用数: 0
h-index: 0
机构:
Toyota Technol Inst Chicago, 6045 Kenwood Ave, Chicago, IL 60637 USA
Univ Chicago, Dept Human Genet, E 58th St, Chicago, IL 60637 USAKAUST, CBRC, CEMSE Div, Thuwal 239556900, Saudi Arabia
Wang, Sheng
;
Wang, Jim Jing-Yan
论文数: 0引用数: 0
h-index: 0
机构:
KAUST, CBRC, CEMSE Div, Thuwal 239556900, Saudi ArabiaKAUST, CBRC, CEMSE Div, Thuwal 239556900, Saudi Arabia
Wang, Jim Jing-Yan
;
Gao, Xin
论文数: 0引用数: 0
h-index: 0
机构:
KAUST, CBRC, CEMSE Div, Thuwal 239556900, Saudi ArabiaKAUST, CBRC, CEMSE Div, Thuwal 239556900, Saudi Arabia