Evaluating a linear k-mer model for protein-DNA interactions using high-throughput SELEX data

被引:6
|
作者
Kahara, Juhani [1 ]
Lahdesmaki, Harri [1 ,2 ]
机构
[1] Aalto Univ, Sch Sci, Dept Informat & Comp Sci, FI-00076 Aalto, Finland
[2] Turku Univ, Turku Ctr Biotechnol, Turku, Finland
来源
BMC BIOINFORMATICS | 2013年 / 14卷
基金
芬兰科学院;
关键词
SIGNALS;
D O I
10.1186/1471-2105-14-S10-S2
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Transcription factor (TF) binding to DNA can be modeled in a number of different ways. It is highly debated which modeling methods are the best, how the models should be built and what can they be applied to. In this study a linear k-mer model proposed for predicting TF specificity in protein binding microarrays (PBM) is applied to a high-throughput SELEX data and the question of how to choose the most informative k-mers to the binding model is studied. We implemented the standard cross-validation scheme to reduce the number of k-mers in the model and observed that the number of k-mers can often be reduced significantly without a great negative effect on prediction accuracy. We also found that the later SELEX enrichment cycles provide a much better discrimination between bound and unbound sequences as model prediction accuracies increased for all proteins together with the cycle number. We compared prediction performance of k-mer and position specific weight matrix (PWM) models derived from the same SELEX data. Consistent with previous results on PBM data, performance of the k-mer model was on average 9%-units better. For the 15 proteins in the SELEX data set with medium enrichment cycles, classification accuracies were on average 71% and 62% for k-mer and PWMs, respectively. Finally, the k-mer model trained with SELEX data was evaluated on ChIP-seq data demonstrating substantial improvements for some proteins. For protein GATA1 the model can distinquish between true ChIP-seq peaks and negative peaks. For proteins RFX3 and NFATC1 the performance of the model was no better than chance.
引用
收藏
页数:9
相关论文
共 50 条
  • [41] Proteome-wide prediction of protein-protein interactions from high-throughput data
    ZhiPing Liu
    Luonan Chen
    Protein & Cell, 2012, 3 (07) : 508 - 520
  • [42] Engineering High Affinity Protein-Protein Interactions Using a High-Throughput Microcapillary Array Platform
    Lim, Sungwon
    Chen, Bob
    Kariolis, Mihalis S.
    Dimov, Ivan K.
    Baer, Thomas M.
    Cochran, Jennifer R.
    ACS CHEMICAL BIOLOGY, 2017, 12 (02) : 336 - 341
  • [43] High-throughput protein characterization by complementation using DNA barcoded fragment libraries
    Biggs, Bradley W.
    Price, Morgan N.
    Lai, Dexter
    Escobedo, Jasmine
    Fortanel, Yuridia
    Huang, Yolanda Y.
    Kim, Kyoungmin
    Trotter, Valentine V.
    Kuehl, Jennifer, V
    Lui, Lauren M.
    Chakraborty, Romy
    Deutschbauer, Adam M.
    Arkin, Adam P.
    MOLECULAR SYSTEMS BIOLOGY, 2024, 20 (11) : 1207 - 1229
  • [44] High-throughput screening for protein-protein interactions using two-hybrid assay
    Cagney, G
    Uetz, P
    Fields, S
    APPLICATIONS OF CHIMERIC GENES AND HYBRID PROTEINS, PT C, 2000, 328 : 3 - 14
  • [45] Inferring protein-protein interactions through high-throughput interaction data from diverse organisms
    Liu, Y
    Liu, NJ
    Zhao, HY
    BIOINFORMATICS, 2005, 21 (15) : 3279 - 3285
  • [46] Selection of DNA aptamers for ovarian cancer biomarker HE4 using CE-SELEX and high-throughput sequencing
    Rachel M. Eaton
    Jamie A. Shallcross
    Liora E. Mael
    Kepler S. Mears
    Lisa Minkoff
    Delia J. Scoville
    Rebecca J. Whelan
    Analytical and Bioanalytical Chemistry, 2015, 407 : 6965 - 6973
  • [47] Selection of DNA aptamers for ovarian cancer biomarker HE4 using CE-SELEX and high-throughput sequencing
    Eaton, Rachel M.
    Shallcross, Jamie A.
    Mael, Liora E.
    Mears, Kepler S.
    Minkoff, Lisa
    Scoville, Delia J.
    Whelan, Rebecca J.
    ANALYTICAL AND BIOANALYTICAL CHEMISTRY, 2015, 407 (23) : 6965 - 6973
  • [48] Uncovering domain motif interactions using high-throughput protein-protein interaction detection methods
    Idrees, Sobia
    Paudel, Keshav Raj
    Sadaf, Tayyaba
    Hansbro, Philip M.
    FEBS LETTERS, 2024, 598 (07) : 725 - 742
  • [49] Filtering high-throughput protein-protein interaction data using a combination of genomic features
    Patil, A
    Nakamura, H
    BMC BIOINFORMATICS, 2005, 6 (1)
  • [50] Filtering high-throughput protein-protein interaction data using a combination of genomic features
    Ashwini Patil
    Haruki Nakamura
    BMC Bioinformatics, 6