Testing machine learning techniques for general application by using protein secondary structure prediction. A brief survey with studies of pitfalls and benefits using a simple progressive learning approach

被引:5
作者
Robson, Barry [1 ,2 ]
机构
[1] Ingine Inc, Cleveland, OH 43212 USA
[2] Dirac Fdn Oxfordshire, Witney, England
关键词
Machine learning; Neural nets; Deep learning; bioinformatics; Secondary structure prediction; Homology; CODE RELATING SEQUENCE; UNIVERSAL EXCHANGE; INFERENCE LANGUAGE; DECISION SUPPORT; GLOBULAR-PROTEINS; CONFORMATION; MEDICINE;
D O I
10.1016/j.compbiomed.2021.104883
中图分类号
Q [生物科学];
学科分类号
07 ; 0710 ; 09 ;
摘要
Many researchers have recently used the prediction of protein secondary structure (local conformational states of amino acid residues) to test advances in predictive and machine learning technology such as Neural Net Deep Learning. Protein secondary structure prediction continues to be a helpful tool in research in biomedicine and the life sciences, but it is also extremely enticing for testing predictive methods such as neural nets that are intended for different or more general purposes. A complication is highlighted here for researchers testing their methods for other applications. Modern protein databases inevitably contain important clues to the answer, so-called "strong buried clues", though often obscurely; they are hard to avoid. This is because most proteins or parts of proteins in a modern protein data base are related to others by biological evolution. For researchers developing machine learning and predictive methods, this can overstate and so confuse understanding of the true quality of a predictive method. However, for researchers using the algorithms as tools, understanding strong buried clues is of great value, because they need to make maximum use of all information available. A simple method related to the GOR methods but with some features of neural nets in the sense of progressive learning of large numbers of weights, is used to explore this. It can acquire tens of millions and hence gigabytes of weights, but they are learned stably by exhaustive sampling. The significance of the findings is discussed in the light of promising recent results from AlphaFold using Google's DeepMind.
引用
收藏
页数:18
相关论文
共 61 条
[1]   PSO Based Neuro-fuzzy Model for Secondary Structure Prediction of Protein [J].
Akbar, Sana ;
Pardasani, Kamal Raj ;
Panda, Nihar Ranjan .
NEURAL PROCESSING LETTERS, 2021, 53 (06) :4593-4612
[2]   THE SWISS-PROT PROTEIN-SEQUENCE DATA-BANK [J].
BAIROCH, A ;
BOECKMANN, B .
NUCLEIC ACIDS RESEARCH, 1992, 20 :2019-2022
[3]   Recent Advances in Protein Homology Detection Propelled by Inter-Residue Interaction Map Threading [J].
Bhattacharya, Sutanu ;
Roche, Rahmatullah ;
Shuvo, Md Hossain ;
Bhattacharya, Debswapna .
FRONTIERS IN MOLECULAR BIOSCIENCES, 2021, 8
[4]   SECONDARY STRUCTURE PREDICTION FOR MODELING BY HOMOLOGY [J].
BOSCOTT, PE ;
BARTON, GJ ;
RICHARDS, WG .
PROTEIN ENGINEERING, 1993, 6 (03) :261-266
[5]   PREDICTION OF PROTEIN CONFORMATION [J].
CHOU, PY ;
FASMAN, GD .
BIOCHEMISTRY, 1974, 13 (02) :222-245
[6]   PROTEIN FOLDING AND HETEROGENEITY INSIDE GLOBULAR PROTEINS [J].
CRAMPIN, J ;
NICHOLSON, BH ;
ROBSON, B .
NATURE, 1978, 272 (5653) :558-560
[7]   Split-complex numbers and Diracbra-kets [J].
Deckelman, Steven ;
Robson, Barry .
COMMUNICATIONS IN INFORMATION AND SYSTEMS, 2014, 14 (03) :135-159
[8]   ANALYSIS OF ACCURACY AND IMPLICATIONS OF SIMPLE METHODS FOR PREDICTING SECONDARY STRUCTURE OF GLOBULAR PROTEINS [J].
GARNIER, J ;
OSGUTHORPE, DJ ;
ROBSON, B .
JOURNAL OF MOLECULAR BIOLOGY, 1978, 120 (01) :97-120
[9]   FURTHER DEVELOPMENTS OF PROTEIN SECONDARY STRUCTURE PREDICTION USING INFORMATION-THEORY - NEW PARAMETERS AND CONSIDERATION OF RESIDUE PAIRS [J].
GIBRAT, JF ;
GARNIER, J ;
ROBSON, B .
JOURNAL OF MOLECULAR BIOLOGY, 1987, 198 (03) :425-443
[10]   Protein secondary structure prediction: A survey of the state of the art [J].
Jiang, Qian ;
Jin, Xin ;
Lee, Shin-Jye ;
Yao, Shaowen .
JOURNAL OF MOLECULAR GRAPHICS & MODELLING, 2017, 76 :379-402