An Information Gain-based Method for Evaluating the Classification Power of Features Towards Identifying Enhancers

被引:6
作者
Zhang, Tianjiao [1 ]
Wang, Rongjie [1 ]
Jiang, Qinghua [2 ]
Wang, Yadong [1 ]
机构
[1] Harbin Inst Technol, Sch Comp Sci & Technol, Harbin 150001, Peoples R China
[2] Harbin Inst Technol, Sch Life Sci & Technol, Harbin 150001, Peoples R China
关键词
Enhancer; gene expression regulation; sequence features; transcriptional features; epigenetic features; information gain; CHROMATIN SIGNATURES; TRANSCRIPTION; DYNAMICS; DISTINCT;
D O I
10.2174/1574893614666191120141032
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Background: Enhancers are cis-regulatory elements that enhance gene expression on DNA sequences. Since most of enhancers are located far from transcription start sites, it is difficult to identify them. As other regulatory elements, the regions around enhancers contain a variety of features, which can help in enhancer recognition. Objective: The classification power of features differs significantly, the performances of existing methods that use one or a few features for identifying enhancer vary greatly. Therefore, evaluating the classification power of each feature can improve the predictive performance of enhancers. Methods: We present an evaluation method based on Information Gain (IG) that captures the entropy change of enhancer recognition according to features. To validate the performance of our method, experiments using the Single Feature Prediction Accuracy (SFPA) were conducted on each feature. Results: The average IG values of the sequence feature, transcriptional feature and epigenetic feature are 0.068, 0.213, and 0.299, respectively. Through SFPA, the average AUC values of the sequence feature, transcriptional feature and epigenetic feature are 0.534, 0.605, and 0.647, respectively. The verification results are consistent with our evaluation results. Conclusion: This IG-based method can effectively evaluate the classification power of features for identifying enhancers. Compared with sequence features, epigenetic features are more effective for recognizing enhancers.
引用
收藏
页码:574 / 580
页数:7
相关论文
共 26 条
  • [1] Transcribed enhancers lead waves of coordinated transcription in transitioning mammalian cells
    Arner, Erik
    Daub, Carsten O.
    Vitting-Seerup, Kristoffer
    Andersson, Robin
    Lilje, Berit
    Drablos, Finn
    Lennartsson, Andreas
    Roennerblad, Michelle
    Hrydziuszko, Olga
    Vitezic, Morana
    Freeman, Tom C.
    Alhendi, Ahmad M. N.
    Arner, Peter
    Axton, Richard
    Baillie, J. Kenneth
    Beckhouse, Anthony
    Bodega, Beatrice
    Briggs, James
    Brombacher, Frank
    Davis, Margaret
    Detmar, Michael
    Ehrlund, Anna
    Endoh, Mitsuhiro
    Eslami, Afsaneh
    Fagiolini, Michela
    Fairbairn, Lynsey
    Faulkner, Geoffrey J.
    Ferrai, Carmelo
    Fisher, Malcolm E.
    Forrester, Lesley
    Goldowitz, Daniel
    Guler, Reto
    Ha, Thomas
    Hara, Mitsuko
    Herlyn, Meenhard
    Ikawa, Tomokatsu
    Kai, Chieko
    Kawamoto, Hiroshi
    Khachigian, Levon M.
    Klinken, S. Peter
    Kojima, Soichi
    Koseki, Haruhiko
    Klein, Sarah
    Mejhert, Niklas
    Miyaguchi, Ken
    Mizuno, Yosuke
    Morimoto, Mitsuru
    Morris, Kelly J.
    Mummery, Christine
    Nakachi, Yutaka
    [J]. SCIENCE, 2015, 347 (6225) : 1010 - 1014
  • [2] NCBI GEO: archive for functional genomics data sets-update
    Barrett, Tanya
    Wilhite, Stephen E.
    Ledoux, Pierre
    Evangelista, Carlos
    Kim, Irene F.
    Tomashevsky, Maxim
    Marshall, Kimberly A.
    Phillippy, Katherine H.
    Sherman, Patti M.
    Holko, Michelle
    Yefanov, Andrey
    Lee, Hyeseung
    Zhang, Naigong
    Robertson, Cynthia L.
    Serova, Nadezhda
    Davis, Sean
    Soboleva, Alexandra
    [J]. NUCLEIC ACIDS RESEARCH, 2013, 41 (D1) : D991 - D995
  • [3] Enhancers as information integration hubs in development: lessons from genomics
    Buecker, Christa
    Wysocka, Joanna
    [J]. TRENDS IN GENETICS, 2012, 28 (06) : 276 - 284
  • [4] Human Disease System Biology
    Cheng, Liang
    Hu, Yang
    [J]. CURRENT GENE THERAPY, 2018, 18 (05) : 255 - 256
  • [5] DincRNA: a comprehensive web-based bioinformatics toolkit for exploring disease associations and ncRNA function
    Cheng, Liang
    Hu, Yang
    Sun, Jie
    Zhou, Meng
    Jiang, Qinghua
    [J]. BIOINFORMATICS, 2018, 34 (11) : 1953 - 1956
  • [6] Enhancer variants: evaluating functions in common disease
    Corradin, Olivia
    Scacheri, Peter C.
    [J]. GENOME MEDICINE, 2014, 6
  • [7] Mapping and analysis of chromatin state dynamics in nine human cell types
    Ernst, Jason
    Kheradpour, Pouya
    Mikkelsen, Tarjei S.
    Shoresh, Noam
    Ward, Lucas D.
    Epstein, Charles B.
    Zhang, Xiaolan
    Wang, Li
    Issner, Robbyn
    Coyne, Michael
    Ku, Manching
    Durham, Timothy
    Kellis, Manolis
    Bernstein, Bradley E.
    [J]. NATURE, 2011, 473 (7345) : 43 - U52
  • [8] Discover regulatory DNA elements using chromatin signatures and artificial neural network
    Firpi, Hiram A.
    Ucar, Duygu
    Tan, Kai
    [J]. BIOINFORMATICS, 2010, 26 (13) : 1579 - 1586
  • [9] Estimation of pairwise sequence similarity of mammalian enhancers with word neighbourhood counts
    Goeke, Jonathan
    Schulz, Marcel H.
    Lasserre, Julia
    Vingron, Martin
    [J]. BIOINFORMATICS, 2012, 28 (05) : 656 - 663
  • [10] GENCODE: The reference human genome annotation for The ENCODE Project
    Harrow, Jennifer
    Frankish, Adam
    Gonzalez, Jose M.
    Tapanari, Electra
    Diekhans, Mark
    Kokocinski, Felix
    Aken, Bronwen L.
    Barrell, Daniel
    Zadissa, Amonida
    Searle, Stephen
    Barnes, If
    Bignell, Alexandra
    Boychenko, Veronika
    Hunt, Toby
    Kay, Mike
    Mukherjee, Gaurab
    Rajan, Jeena
    Despacio-Reyes, Gloria
    Saunders, Gary
    Steward, Charles
    Harte, Rachel
    Lin, Michael
    Howald, Cedric
    Tanzer, Andrea
    Derrien, Thomas
    Chrast, Jacqueline
    Walters, Nathalie
    Balasubramanian, Suganthi
    Pei, Baikang
    Tress, Michael
    Manuel Rodriguez, Jose
    Ezkurdia, Iakes
    van Baren, Jeltje
    Brent, Michael
    Haussler, David
    Kellis, Manolis
    Valencia, Alfonso
    Reymond, Alexandre
    Gerstein, Mark
    Guigo, Roderic
    Hubbard, Tim J.
    [J]. GENOME RESEARCH, 2012, 22 (09) : 1760 - 1774