Predicting gene expression from genome wide protein binding profiles

被引:2
作者
Ferdous, Mohsina M. [1 ]
Bao, Yanchun [2 ]
Vinciotti, Veronica [3 ]
Liu, Xiaohui [1 ]
Wilson, Paul [4 ]
机构
[1] Brunel Univ London, Dept Comp Sci, Uxbridge UB8 3PH, Middx, England
[2] Univ Essex, ISER, Colchester CO4 3SQ, Essex, England
[3] Brunel Univ London, Dept Math, Uxbridge UB8 3PH, Middx, England
[4] GlaxoSmithKline Med Res Ctr, Computat Biol, Stevenage SG1 2NY, Herts, England
基金
英国工程与自然科学研究理事会; 英国经济与社会研究理事会; 英国生物技术与生命科学研究理事会;
关键词
ChIP-seq; Epigenetics; Gene expression; Markov random field; Machine learning; MICROARRAY;
D O I
10.1016/j.neucom.2017.09.094
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
High-throughput technologies such as chromatin immunoprecipitation (IP) followed by next generation sequencing (ChIP-seq) in combination with gene expression studies have enabled researchers to investigate relationships between the distribution of chromosome-associated proteins and the regulation of gene transcription on a genome-wide scale. Several attempts at integrative analyses have identified direct relationships between the two processes. However, a comprehensive understanding of the regulatory events remains elusive. This is in part due to the scarcity of robust analytical methods for the detection of binding regions from ChIP-seq data. In this paper, we have applied a recently proposed Markov random field model for the detection of enriched binding regions under different biological conditions and time points. The method accounts for spatial dependencies and IP efficiencies, which can vary significantly between different experiments. We further defined the enriched chromosomal binding regions as distinct genomic features, such as promoter, exon, intron, and distal intergenic, and then investigated how predictive each of these features are of gene expression activity using machine learning techniques, including neural networks, decision trees and random forest. The analysis of a ChIP-seq time-series dataset comprising six protein markers and associated microarray data, obtained from the same biological samples, shows promising results and identified biologically plausible relationships between the protein profiles and gene regulation. (C) 2017 The Authors. Published by Elsevier B.V.
引用
收藏
页码:1490 / 1499
页数:10
相关论文
共 25 条
  • [1] Joint modeling of ChIP-seq data via a Markov random field model
    Bao, Yanchun
    Vinciotti, Veronica
    Wit, Ernst
    't Hoen, Peter A. C.
    [J]. BIOSTATISTICS, 2014, 15 (02) : 296 - 310
  • [2] Random forests
    Breiman, L
    [J]. MACHINE LEARNING, 2001, 45 (01) : 5 - 32
  • [3] Carlson M., TXDB MMUSCULUS UCSC
  • [4] beadarray:: R classes and methods for Illumina bead-based data
    Dunning, Mark J.
    Smith, Mike L.
    Ritchie, Matthew E.
    Tavare, Simon
    [J]. BIOINFORMATICS, 2007, 23 (16) : 2183 - 2184
  • [5] Identification of context-specific gene regulatory networks with GEMULA-gene expression modeling using LAsso
    Geeven, Geert
    van Kesteren, Ronald E.
    Smit, August B.
    de Gunst, Mathisca C. M.
    [J]. BIOINFORMATICS, 2012, 28 (02) : 214 - 221
  • [6] PTHGRN: unraveling post-translational hierarchical gene regulatory networks using PPI, ChIP-seq and gene expression data
    Guan, Daogang
    Shao, Jiaofang
    Zhao, Zhongying
    Wang, Panwen
    Qin, Jing
    Deng, Youping
    Boheler, Kenneth R.
    Wang, Junwen
    Yan, Bin
    [J]. NUCLEIC ACIDS RESEARCH, 2014, 42 (W1) : W130 - W136
  • [7] Structure and mechanism of the RNA polymerase II transcription machinery
    Hahn, S
    [J]. NATURE STRUCTURAL & MOLECULAR BIOLOGY, 2004, 11 (05) : 394 - 403
  • [8] Introns and gene expression: Cellular constraints, transcriptional regulation, and evolutionary consequences
    Heyn, Patricia
    Kalinka, Alex T.
    Tomancak, Pavel
    Neugebauer, Karla M.
    [J]. BIOESSAYS, 2015, 37 (02) : 148 - 154
  • [9] Hurd Paul J., 2009, Briefings in Functional Genomics & Proteomics, V8, P174, DOI 10.1093/bfgp/elp013
  • [10] Ultrafast and memory-efficient alignment of short DNA sequences to the human genome
    Langmead, Ben
    Trapnell, Cole
    Pop, Mihai
    Salzberg, Steven L.
    [J]. GENOME BIOLOGY, 2009, 10 (03):