Gene expression classification using epigenetic features and DNA sequence composition in the human embryonic stem cell line H1

被引:10
作者
Su, Wen-Xia [1 ]
Li, Qian-Zhong [1 ]
Zhang, Lu-Qiang [1 ]
Fan, Guo-Liang [1 ]
Wu, Cheng-Yan [1 ]
Yan, Zhen-He [1 ]
Zuo, Yong-Chun [2 ]
机构
[1] Inner Mongolia Univ, Sch Phys Sci & Technol, Lab Theoret Biophys, Hohhot 010021, Peoples R China
[2] Inner Mongolia Univ, Coll Life Sci, Key Lab Mammalian Reprod Biol & Biotechnol, Minist Educ, Hohhot 010021, Peoples R China
基金
中国国家自然科学基金; 高等学校博士学科点专项科研基金;
关键词
Epigenetic factors; Support vector machines; Embryonic stem cells; Highly expressed gene; Lowly expressed gene; Web-server; HISTONE MODIFICATION PROFILES; TRANSCRIPTION FACTOR-BINDING; CHIP-SEQ; CHROMATIN FEATURES; METHYLATION; PREDICTION; PROTEINS; SITES; PATTERNS;
D O I
10.1016/j.gene.2016.07.059
中图分类号
Q3 [遗传学];
学科分类号
071007 ; 090102 ;
摘要
Epigenetic factors are known to correlate with gene expression in the existing studies. However, quantitative models that accurately classify the highly and lowly expressed genes based on epigenetic factors are currently lacking. In this study, a new machine learning method combines histone modifications, DNA methylation, DNA accessibility, transcription factors, and trinucleotide composition with support vector machines (SVM) is developed in the context of human embryonic stem cell line (H1). The results indicate that the predictive accuracy will be markedly improved when the epigenetic features are considered. The predictive accuracy and Matthews correlation coefficient of the best model are as high as 95.96% and 0.92 for 10-fold cross-validation test, and 95.58% and 0.92 for independent dataset test, respectively. Our model provides a good way to judge a gene is either highly or lowly expressed gene by using genetic and epigenetic data, when the expression data of the gene is lacking. And a web-server GECES for our analysis method is established at http://202.207.14.87:8032/fuwu/GECES/index.asp, so that other scientists can easily get their desired results by our web-server, without going through the mathematical details. (C) 2016 Elsevier B.V. All rights reserved.
引用
收藏
页码:227 / 234
页数:8
相关论文
共 41 条
[1]   High-resolution profiling of histone methylations in the human genome [J].
Barski, Artern ;
Cuddapah, Suresh ;
Cui, Kairong ;
Roh, Tae-Young ;
Schones, Dustin E. ;
Wang, Zhibin ;
Wei, Gang ;
Chepelev, Iouri ;
Zhao, Keji .
CELL, 2007, 129 (04) :823-837
[2]   Cell type-specific DNA methylation patterns in the human breast [J].
Bloushtain-Qimron, Noga ;
Yao, Jun ;
Snyder, Eric L. ;
Shipitsin, Michail ;
Campbell, Lauren L. ;
Mani, Sendurai A. ;
Hua, Min ;
Chen, Haiyan ;
Ustyansky, Vadim ;
Antosiewicz, Jessica E. ;
Argani, Pedram ;
Halushka, Marc K. ;
Thomson, James A. ;
Pharoah, Paul ;
Porgador, Angel ;
Sukumar, Saraswati ;
Parsons, Ramon ;
Richardson, Andrea L. ;
Stampfer, Martha R. ;
Gelman, Rebecca S. ;
Nikolskaya, Tatiana ;
Nikolsky, Yuri ;
Polyak, Kornelia .
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 2008, 105 (37) :14076-14081
[3]   Epigenetic regulation of glucose-stimulated osteopontin (OPN) expression in diabetic kidney [J].
Cai, Mengyin ;
Bompada, Pradeep ;
Atac, David ;
Laakso, Markku ;
Groop, Leif ;
De Marinis, Yang .
BIOCHEMICAL AND BIOPHYSICAL RESEARCH COMMUNICATIONS, 2016, 469 (01) :108-113
[4]   LIBSVM: A Library for Support Vector Machines [J].
Chang, Chih-Chung ;
Lin, Chih-Jen .
ACM TRANSACTIONS ON INTELLIGENT SYSTEMS AND TECHNOLOGY, 2011, 2 (03)
[5]   iRNA-Methyl: Identifying N6-methyladenosine sites using pseudo nucleotide composition [J].
Chen, Wei ;
Feng, Pengmian ;
Ding, Hui ;
Lin, Hao ;
Chou, Kuo-Chen .
ANALYTICAL BIOCHEMISTRY, 2015, 490 :26-33
[6]   iRSpot-PseDNC: identify recombination spots with pseudo dinucleotide composition [J].
Chen, Wei ;
Feng, Peng-Mian ;
Lin, Hao ;
Chou, Kuo-Chen .
NUCLEIC ACIDS RESEARCH, 2013, 41 (06) :e68
[7]   Prediction of the subcellular location of apoptosis proteins [J].
Chen, Ying-Li ;
Li, Qian-Zhong .
JOURNAL OF THEORETICAL BIOLOGY, 2007, 245 (04) :775-783
[8]   Modeling the relative relationship of transcription factor binding and histone modifications to gene expression levels in mouse embryonic stem cells [J].
Cheng, Chao ;
Gerstein, Mark .
NUCLEIC ACIDS RESEARCH, 2012, 40 (02) :553-568
[9]   A statistical framework for modeling gene expression using chromatin features and application to modENCODE datasets [J].
Cheng, Chao ;
Yan, Koon-Kiu ;
Yip, Kevin Y. ;
Rozowsky, Joel ;
Alexander, Roger ;
Shou, Chong ;
Gerstein, Mark .
GENOME BIOLOGY, 2011, 12 (02)
[10]   PREDICTION OF PROTEIN STRUCTURAL CLASSES [J].
CHOU, KC ;
ZHANG, CT .
CRITICAL REVIEWS IN BIOCHEMISTRY AND MOLECULAR BIOLOGY, 1995, 30 (04) :275-349