Cascade detection for the extraction of localized sequence features; specificity results for HIV-1 protease and structure-function results for the Schellman loop

被引:8
作者
Newell, Nicholas E.
机构
[1] Reading MA 01867
关键词
HUMAN-IMMUNODEFICIENCY-VIRUS; DISCOVERY; SELECTION; RESIDUES;
D O I
10.1093/bioinformatics/btr594
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Results: Cascade detection (CD), a new algorithm for the extraction of localized features from sequence sets, is introduced. CD is a natural extension of the proportional modeling techniques used in contingency table analysis into the domain of feature detection. The algorithm is successfully tested on synthetic data and then applied to feature detection problems from two different domains to demonstrate its broad utility. An analysis of HIV-1 protease specificity reveals patterns of strong first-order features that group hydrophobic residues by side chain geometry and exhibit substantial symmetry about the cleavage site. Higher order results suggest that favorable cooperativity is weak by comparison and broadly distributed, but indicate possible synergies between negative charge and hydrophobicity in the substrate. Structure-function results for the Schellman loop, a helix-capping motif in proteins, contain strong first-order features and also show statistically significant cooperativities that provide new insights into the design of the motif. These include a new 'hydrophobic staple' and multiple amphipathic and electrostatic pair features. CD should prove useful not only for sequence analysis, but also for the detection of multifactor synergies in cross-classified data from clinical studies or other sources.
引用
收藏
页码:3415 / 3422
页数:8
相关论文
共 23 条
  • [1] Helix capping
    Aurora, R
    Rose, GD
    [J]. PROTEIN SCIENCE, 1998, 7 (01) : 21 - 38
  • [2] Beck Zachary Q., 2002, Current Drug Targets - Infectious Disorders, V2, P37, DOI 10.2174/1568005024605837
  • [3] CONTROLLING THE FALSE DISCOVERY RATE - A PRACTICAL AND POWERFUL APPROACH TO MULTIPLE TESTING
    BENJAMINI, Y
    HOCHBERG, Y
    [J]. JOURNAL OF THE ROYAL STATISTICAL SOCIETY SERIES B-STATISTICAL METHODOLOGY, 1995, 57 (01) : 289 - 300
  • [4] BIRCH MW, 1963, J ROY STAT SOC B, V25, P220
  • [5] Bishop M.M., 1975, DISCRETE MULTIVARIAT
  • [6] Research on collaborative negotiation for e-commerce.
    Feng, YQ
    Lei, Y
    Li, Y
    Cao, RZ
    [J]. 2003 INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND CYBERNETICS, VOLS 1-5, PROCEEDINGS, 2003, : 2085 - 2088
  • [7] Computational intelligence approaches for pattern discovery in biological systems
    Fogel, Gary B.
    [J]. BRIEFINGS IN BIOINFORMATICS, 2008, 9 (04) : 307 - 316
  • [8] MSDmotif: exploring protein sites and motifs
    Golovin, Adel
    Henrick, Kim
    [J]. BMC BIOINFORMATICS, 2008, 9 (1)
  • [9] Gene selection for cancer classification using support vector machines
    Guyon, I
    Weston, J
    Barnhill, S
    Vapnik, V
    [J]. MACHINE LEARNING, 2002, 46 (1-3) : 389 - 422
  • [10] PROBING THE ROLES OF RESIDUES AT THE E-POSITION AND G-POSITION OF THE GCN4 LEUCINE-ZIPPER BY COMBINATORIAL MUTAGENESIS
    HU, JC
    NEWELL, NE
    TIDOR, B
    SAUER, RT
    [J]. PROTEIN SCIENCE, 1993, 2 (07) : 1072 - 1084