AN EXPECTATION MAXIMIZATION (EM) ALGORITHM FOR THE IDENTIFICATION AND CHARACTERIZATION OF COMMON SITES IN UNALIGNED BIOPOLYMER SEQUENCES

被引:316
作者
LAWRENCE, CE
REILLY, AA
机构
[1] Biometrics Laboratory, Wadsworth Center for Laboratories and Research, New York State Department of Health, Albany, New York
来源
PROTEINS-STRUCTURE FUNCTION AND GENETICS | 1990年 / 7卷 / 01期
关键词
CRP; DNA binding proteins; finite mixtures; maximum likelihood; transcription regulation;
D O I
10.1002/prot.340070105
中图分类号
Q5 [生物化学]; Q7 [分子生物学];
学科分类号
071010 ; 081704 ;
摘要
Statistical methodology for the identification and characterization of protein binding sites in a set of unaligned DNA fragments is presented. Each sequence must contain at least one common site. No alignment of the sites is required. Instead, the uncertainty in the location of the sites is handled by employing the missing information principle to develop an “expectation maximization” (EM) algorithm. This approach allows for the simultaneous identification of the sites and characterization of the binding motifs. The reliability of the algorithm increases with the number of fragments, but the computations increase only linearly. The method is illustrated with an example, using known cyclic adenosine monophophate receptor protein (CRP) binding sites. The final motif is utilized in a search for undiscovered CRP binding sites. Copyright © 1990 Wiley‐Liss, Inc.
引用
收藏
页码:41 / 51
页数:11
相关论文
共 26 条
[11]   EFFICIENT ALGORITHMS FOR MOLECULAR SEQUENCE-ANALYSIS [J].
KARLIN, S ;
MORRIS, M ;
GHANDOUR, G ;
LEUNG, MY .
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 1988, 85 (03) :841-845
[12]  
Kendall M, 1979, ADV THEORY STAT, V2, P1
[13]   GENOME PROJECTS READY TO GO [J].
LEWIN, R .
SCIENCE, 1988, 240 (4852) :602-604
[14]  
LITTLE RJA, 1987, STAT ANAL MISSING DA, P1
[15]   THE DNA-BINDING DOMAIN AND BENDING ANGLE OF ESCHERICHIA-COLI CAP PROTEIN [J].
LIUJOHNSON, HN ;
GARTENBERG, MR ;
CROTHERS, DM .
CELL, 1986, 47 (06) :995-1005
[16]  
MARIANS JM, 1982, J BIOL CHEM, V2257, P5656
[17]   ESCHERICHIA-COLI PROMOTER SEQUENCES PREDICT INVITRO RNA-POLYMERASE SELECTIVITY [J].
MULLIGAN, ME ;
HAWLEY, DK ;
ENTRIKEN, R ;
MCCLURE, WR .
NUCLEIC ACIDS RESEARCH, 1984, 12 (01) :789-800
[19]   IMPROVED TOOLS FOR BIOLOGICAL SEQUENCE COMPARISON [J].
PEARSON, WR ;
LIPMAN, DJ .
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 1988, 85 (08) :2444-2448
[20]   A COMPLEX NUCLEOPROTEIN STRUCTURE INVOLVED IN ACTIVATION OF TRANSCRIPTION OF 2 DIVERGENT ESCHERICHIA-COLI PROMOTERS [J].
RAIBAUD, O ;
VIDALINGIGLIARDI, D ;
RICHET, E .
JOURNAL OF MOLECULAR BIOLOGY, 1989, 205 (03) :471-485