A probabilistic treatment of the missing spot problem in 2D gel electrophoresis experimentse

被引:14
作者
Krogh, Morten [1 ]
Fernandez, Celine
Teilum, Maria
Bengtsson, Sofia
James, Peter
机构
[1] Lund Univ, Dept Theoret Phys, S-22100 Lund, Sweden
[2] Lund Univ, Dept Expt Med Sci, Div Diabet Metab & Endocrinol Mol, Endocrinol Grp, S-22100 Lund, Sweden
[3] Lund Univ, Wallenburg Neurosci Ctr, Expt Brain Res Lab, S-22100 Lund, Sweden
[4] Lund Univ, Dept Prot Technol, S-22100 Lund, Sweden
关键词
2D-PAGE; missing values; maximum likelihood;
D O I
10.1021/pr070137p
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Two-dimensional SIDS-PAGE gel electrophoresis using post-run staining is widely used to measure the abundances of thousands of protein spots simultaneously. Usually, the protein abundances of two or more biological groups are compared using biological and technical replicates. After gel separation and staining, the spots are detected, spot volumes are quantified, and spots are matched across gels. There are almost always many missing values in the resulting data set. The missing values arise either because the corresponding proteins have very low abundances (or are absent) or because of experimental errors such as incomplete/over focusing in the first dimension or varying run times in the second dimension as well as faulty spot detection and matching. In this study, we show that the probability for a spot to be missing can be modeled by a logistic regression function of the logarithm of the volume. Furthermore, we present an algorithm that takes a set of gels with technical and biological replicates as input and estimates the average protein abundances in the biological groups from the number of missing spots and measured volumes of the present spots using a maximum likelihood approach. Confidence intervals for abundances and p-values for differential expression between two groups are calculated using bootstrap sampling. The algorithm is compared to two standard approaches, one that discards missing values and one that sets all missing values to zero. We have evaluated this approach in two different gel data sets of different biological origin. An F-program, implementing the algorithm, is freely available at httP://bioinfo.thep.lu.se/MissingValues2Dgels.html.
引用
收藏
页码:3335 / 3343
页数:9
相关论文
共 30 条
[1]  
ANDERSON NL, 1981, CLIN CHEM, V27, P1807
[2]   CONTROLLING THE FALSE DISCOVERY RATE - A PRACTICAL AND POWERFUL APPROACH TO MULTIPLE TESTING [J].
BENJAMINI, Y ;
HOCHBERG, Y .
JOURNAL OF THE ROYAL STATISTICAL SOCIETY SERIES B-STATISTICAL METHODOLOGY, 1995, 57 (01) :289-300
[3]   LSimpute: accurate estimation of missing values in microarray data with least squares methods [J].
Bo, TH ;
Dysvik, J ;
Jonassen, I .
NUCLEIC ACIDS RESEARCH, 2004, 32 (03) :e34
[4]   Processing of data generated by 2-dimensional gel electrophoresis for statistical analysis: Missing data, normalization, and statistics [J].
Chang, JS ;
Van Remmen, H ;
Ward, WF ;
Regnier, FE ;
Richardson, A ;
Cornell, J .
JOURNAL OF PROTEOME RESEARCH, 2004, 3 (06) :1210-1218
[5]   Proteomic technologies in modern biomedical science [J].
Govorun, VM ;
Archakov, AI .
BIOCHEMISTRY-MOSCOW, 2002, 67 (10) :1109-1123
[6]   Quantitative analysis of complex protein mixtures using isotope-coded affinity tags [J].
Gygi, SP ;
Rist, B ;
Gerber, SA ;
Turecek, F ;
Gelb, MH ;
Aebersold, R .
NATURE BIOTECHNOLOGY, 1999, 17 (10) :994-999
[7]   Mass spectrometric analysis of protein mixtures at low levels using cleavable 13C-isotope-coded affinity tag and multidimensional chromatography [J].
Hansen, KC ;
Schmitt-Ulms, G ;
Chalkley, RJ ;
Hirsch, J ;
Baldwin, MA ;
Burlingame, AL .
MOLECULAR & CELLULAR PROTEOMICS, 2003, 2 (05) :299-314
[8]  
Ihaka R., 1996, Journal of computational and graphical statistics, V5, P299, DOI [10.1080/10618600.1996.10474713, 10.2307/1390807]
[9]  
James P., 2001, Proteome Research: Mass Spectrometry
[10]   Improving missing value imputation of microarray data by using spot quality weights [J].
Johansson, Peter ;
Hakkinen, Jari .
BMC BIOINFORMATICS, 2006, 7 (1)