Combining evidence using p-values: application to sequence homology searches

被引:937
作者
Bailey, TL [1 ]
Gribskov, M [1 ]
机构
[1] San Diego Supercomp Ctr, San Diego, CA 92186 USA
关键词
D O I
10.1093/bioinformatics/14.1.48
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Motivation: To illustrate an intuitive and statistically valid method for combining independent sources of evidence that yields a p-value for the complete evidence, and to apply it to the problem of detecting simultaneous matches to multiple patterns in sequence homology searches. Results: In sequence analysis, two or more (approximately) independent measure of the membership of a sequence (or sequence region) in some class are often available. We would like to estimate the likelihood of the sequence being a member of the class in view of all the available evidence. an example is estimating the significance of the observed match of a macromolecular sequence (DNA or protein) to a set of patterns (motifs) that characterize a biological sequence family. An intuitive way to do this is to express each piece of evidence a as p-value, and then use the product of these p-values as the measure of membership in the family. We derive a formula and algorithm (QFAST) for calculating the statistical distribution of the product of n independent p-values. We demonstrate that sorting sequences by this p-value effectively combines the information present in multiple motifs, leading to highly accurate and sensitive sequence homology searches.
引用
收藏
页码:48 / 54
页数:7
相关论文
共 15 条
[1]  
Bailey T L, 1996, Proc Int Conf Intell Syst Mol Biol, V4, P15
[2]   Score distributions for simultaneous matching to multiple motifs [J].
Bailey, TL ;
Gribskov, M .
JOURNAL OF COMPUTATIONAL BIOLOGY, 1997, 4 (01) :45-59
[3]  
BAILEY TL, 1995, MACH LEARN, V21, P51, DOI 10.1007/BF00993379
[4]  
BAIROCH A, 1994, NUCLEIC ACIDS RES, V22, P3578
[5]  
Feller W., 1957, An introduction to probability theory and its applications, VII
[6]  
Fisher R.A., 1970, STAT METHODS RES WOR
[7]   Use of receiver operating characteristic (ROC) analysis to evaluate sequence matching [J].
Gribskov, M ;
Robinson, NL .
COMPUTERS & CHEMISTRY, 1996, 20 (01) :25-33
[8]  
HENIKOFF S, 1995, GENE, V163, pGC17, DOI 10.1016/0378-1119(95)00486-P
[9]   GIBBS MOTIF SAMPLING - DETECTION OF BACTERIAL OUTER-MEMBRANE PROTEIN REPEATS [J].
NEUWALD, AF ;
LIU, JS ;
LAWRENCE, CE .
PROTEIN SCIENCE, 1995, 4 (08) :1618-1632
[10]  
OOSTERHOFF J, 1969, COMBINATION ONE SIDE