INFORMATION-THEORETICAL ENTROPY AS A MEASURE OF SEQUENCE VARIABILITY

被引:158
作者
SHENKIN, PS
ERMAN, B
MASTRANDREA, LD
机构
[1] Department of Chemistry, Barnard College, New York
来源
PROTEINS-STRUCTURE FUNCTION AND GENETICS | 1991年 / 11卷 / 04期
关键词
INFORMATION THEORY; ENTROPY; VARIABILITY; SEQUENCE COMPARISON; IMMUNOGLOBULINS; ANTIBODIES;
D O I
10.1002/prot.340110408
中图分类号
Q5 [生物化学]; Q7 [分子生物学];
学科分类号
071010 ; 081704 ;
摘要
We propose the use of the information-theoretical entropy, S = -SIGMA-p(i) log2 p(i), as a measure of variability at a given position in a set of aligned sequences. p(i) stands for the fraction of times the i-th type appears at a position. For protein sequences, the sum has up to 20 terms, for nucleotide sequences, up to 4 terms, and for codon sequences, up to 61 terms. We compare S and V(S), a related measure, in detail with V(K), the traditional measure of immunoglobulin sequence variability, both in the and as applied to the immunoglobulins. We conclude that S has desirable mathematical properties that V(K) lacks and has intuitive and statistical meanings that accord well with the notion of variability. We find that V(K) and the S-based measures are highly correlated for the immunoglobulins. We show by analysis of sequence data and by means of a mathematical model that this correlation is due to a strong tendency for the frequency of occurrence of amino acid types at a given position to be log-linear. It is not known whether the immunoglobulins are typical or atypical of protein families in this regard, nor is the origin of the observed rank-frequency distribution obvious, although we discuss several possible etiologies.
引用
收藏
页码:297 / 313
页数:17
相关论文
共 24 条
[1]   SELECTION OF DNA-BINDING SITES BY REGULATORY PROTEINS .2. THE BINDING-SPECIFICITY OF CYCLIC-AMP RECEPTOR PROTEIN TO RECOGNITION SITES [J].
BERG, OG ;
VONHIPPEL, PH .
JOURNAL OF MOLECULAR BIOLOGY, 1988, 200 (04) :709-723
[2]   SELECTION OF DNA-BINDING SITES BY REGULATORY PROTEINS - STATISTICAL-MECHANICAL THEORY AND APPLICATION TO OPERATORS AND PROMOTERS [J].
BERG, OG ;
VONHIPPEL, PH .
JOURNAL OF MOLECULAR BIOLOGY, 1987, 193 (04) :723-743
[3]  
BEVINGTON PR, 1962, DATA REDUCTION ERROR
[4]  
BRUCCOLERI RE, 1988, NATURE, V335, P565
[5]   THE FREQUENCY OF ION-PAIR SUBSTRUCTURES IN PROTEINS IS QUANTITATIVELY RELATED TO ELECTROSTATIC POTENTIAL - A STATISTICAL-MODEL FOR NONBONDED INTERACTIONS [J].
BRYANT, SH ;
LAWRENCE, CE .
PROTEINS-STRUCTURE FUNCTION AND BIOINFORMATICS, 1991, 9 (02) :108-119
[6]   CANONICAL STRUCTURES FOR THE HYPERVARIABLE REGIONS OF IMMUNOGLOBULINS [J].
CHOTHIA, C ;
LESK, AM .
JOURNAL OF MOLECULAR BIOLOGY, 1987, 196 (04) :901-917
[7]  
FINE R M, 1986, Proteins Structure Function and Genetics, V1, P342, DOI 10.1002/prot.340010408
[8]  
GARNIER J, 1991, COMPUT APPL BIOSCI, V7, P133
[9]   IDIOTYPIC NETWORKS AND OTHER PRECONCEIVED IDEAS [J].
JERNE, NK .
IMMUNOLOGICAL REVIEWS, 1984, 79 :5-24
[10]   RESOLUTION OF HYPERVARIABLE REGIONS IN T-CELL RECEPTOR BETA-CHAINS BY A MODIFIED WU-KABAT INDEX OF AMINO-ACID DIVERSITY [J].
JORES, R ;
ALZARI, PM ;
MEO, T .
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 1990, 87 (23) :9138-9142