PREDICTION OF 3D CHROMATIN STRUCTURE USING RECURRENT NEURAL NETWORKS

被引:0
作者
Rozenwald, Michal [1 ]
Khrameeva, Ekaterina [2 ]
Sapunov, Grigory [1 ]
Gelfand, Mikhail [3 ]
机构
[1] Natl Res Univ, Higher Sch Econ, Dept Comp Sci, Moscow, Russia
[2] Skolkovo Inst Sci & Technol, Ctr Life Sci, Moscow, Russia
[3] RAS, Inst Informat Transmiss Problems, Moscow, Russia
来源
PROCEEDINGS 2018 IEEE INTERNATIONAL CONFERENCE ON BIOINFORMATICS AND BIOMEDICINE (BIBM) | 2018年
关键词
3D chromatin structure; Topologically Associating Domains; Machine Learning; Recurrent Neural Networks;
D O I
暂无
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
The Hi-C technology provides an opportunity to obtain data on chromatin interactions. This technique has unraveled many principles of chromosomal folding, including subdivision of the genome into Topologically Associating Domains (TADs). Moreover, the correlation between the structure of chromatin and various factors such as transcriptional repressor CTCF binding sites, replication timing and many epigenetic features has been discovered [1-3]. Our study focuses on application of Machine Learning methods to explore the 3D structure of chromatin. We predicted TADs annotation based on a comprehensive set of predictors that includes chromatin marks and histone modifications. The data from the following ChIP-seq experiments have been selected: Chriz, CTCF, Su(Hw), BEAF-32, CP190, Smc3, GAF, H3K27me3, H3K27a, H3K36me1, H3K36me3, H3K4me1, H3K9ac, H3K9me1, H3K9me2, H3K9me3, H4K16ac The target value is a characteristic that corresponds to the Topologically Associated Domains annotation using the Armatus software [4]. The objects are DNA sequence fragments of 20000 bp of fruit fly Drosophila melanogaster. We consider linear regression models with three types of regularization (Lasso, Ridge, Elastic Net) and Neural Networks. The sequential relationship of the DNA bins in terms of the physical distance justifies the usage of Recurrent Neural Networks. We built RNN architectures with different numbers of LSTM units and the input size from 1 to 10 DNA bins. The predictive models were trained and evaluated using a weighted MSE score. The mean target value of the train dataset was used as a constant prediction to estimate the performance of the models. The best score of weighted MSE was demonstrated by bidirectional LSTM RNN with 64 units. The input size of this modes is six DNA bins which is also equal to the average size of TADs. The most accurate RNN strongly outperforms the contant prediction and all four linear models. A protein Chriz is known to be associated with formation of chromatin domains in Drosophila melanogaster [5]. The feature corresponding to Chriz was selected by the linear models with L1 normalization as the most informative one. A prioritization of the features importance was obtained.
引用
收藏
页码:2488 / 2488
页数:1
相关论文
共 5 条
  • [1] Filippova D., 2014, ALGORITHM MOL BIOL, P9
  • [2] Fortin J, 2015, GENUINE BIOL, P16
  • [3] Chriz, a chromodomain protein specific for the interbands of Drosophila melanogaster polytene chromosomes
    Gortchakov, AA
    Eggert, H
    Gan, M
    Mattow, J
    Zhimulev, IF
    Saumweber, H
    [J]. CHROMOSOMA, 2005, 114 (01) : 54 - 66
  • [4] Schreiber J., 2017, BIORXIV, DOI 10.1101/103614
  • [5] Active chromatin and transcription play a key role in chromosome partitioning into topologically associating domains
    Ulianov, Sergey V.
    Khrameeva, Ekaterina E.
    Gavrilov, Alexey A.
    Flyamer, Ilya M.
    Kos, Pavel
    Mikhaleva, Elena A.
    Penin, Aleksey A.
    Logacheva, Maria D.
    Imakaev, Maxim V.
    Chertovich, Alexander
    Gelfand, Mikhail S.
    Shevelyov, Yuri Y.
    Razin, Sergey V.
    [J]. GENOME RESEARCH, 2016, 26 (01) : 70 - 84