Imbalanced Class Learning in Epigenetics

被引:13
作者
Haque, M. Muksitul [1 ,2 ]
Skinner, Michael K. [1 ]
Holder, Lawrence B. [2 ]
机构
[1] Washington State Univ, Sch Biol Sci, Ctr Reprod Biol, Pullman, WA 99164 USA
[2] Washington State Univ, Sch Elect Engn & Comp Sci, Pullman, WA 99164 USA
关键词
biology; computational molecular biology; DNA; genomics; machine earning; TRANSGENERATIONAL INHERITANCE; CLASSIFICATION; DISEASE; TARGETS;
D O I
10.1089/cmb.2014.0008
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
In machine learning, one of the important criteria for higher classification accuracy is a balanced dataset. Datasets with a large ratio between minority and majority classes face hindrance in learning using any classifier. Datasets having a magnitude difference in number of instances between the target concept result in an imbalanced class distribution. Such datasets can range from biological data, sensor data, medical diagnostics, or any other domain where labeling any instances of the minority class can be time-consuming or costly or the data may not be easily available. The current study investigates a number of imbalanced class algorithms for solving the imbalanced class distribution present in epigenetic datasets. Epigenetic (DNA methylation) datasets inherently come with few differentially DNA methylated regions (DMR) and with a higher number of non-DMR sites. For this class imbalance problem, a number of algorithms are compared, including the TAN+AdaBoost algorithm. Experiments performed on four epigenetic datasets and several known datasets show that an imbalanced dataset can have similar accuracy as a regular learner on a balanced dataset.
引用
收藏
页码:492 / 507
页数:16
相关论文
共 52 条
[31]   Plastics Derived Endocrine Disruptors (BPA, DEHP and DBP) Induce Epigenetic Transgenerational Inheritance of Obesity, Reproductive Disease and Sperm Epimutations [J].
Manikkam, Mohan ;
Tracey, Rebecca ;
Guerrero-Bosagna, Carlos ;
Skinner, Michael K. .
PLOS ONE, 2013, 8 (01)
[32]   Pesticide and insect repellent mixture (permethrin and DEET) induces epigenetic transgenerational inheritance of disease and sperm epimutations [J].
Manikkam, Mohan ;
Tracey, Rebecca ;
Guerrero-Bosagna, Carlos ;
Skinner, Michael K. .
REPRODUCTIVE TOXICOLOGY, 2012, 34 (04) :708-719
[33]   Dioxin (TCDD) Induces Epigenetic Transgenerational Inheritance of Adult Onset Disease and Sperm Epimutations [J].
Manikkam, Mohan ;
Tracey, Rebecca ;
Guerrero-Bosagna, Carlos ;
Skinner, Michael K. .
PLOS ONE, 2012, 7 (09)
[34]   Transgenerational Actions of Environmental Compounds on Reproductive Disease and Identification of Epigenetic Biomarkers of Ancestral Exposures [J].
Manikkam, Mohan ;
Guerrero-Bosagna, Carlos ;
Tracey, Rebecca ;
Haque, Md. M. ;
Skinner, Michael K. .
PLOS ONE, 2012, 7 (02)
[35]  
Mease D, 2007, J MACH LEARN RES, V8, P409
[36]   Environmentally Induced Epigenetic Transgenerational Inheritance of Ovarian Disease [J].
Nilsson, Eric ;
Larsen, Ginger ;
Manikkam, Mohan ;
Guerrero-Bosagna, Carlos ;
Savenkova, Marina I. ;
Skinner, Michael K. .
PLOS ONE, 2012, 7 (05)
[37]   Classification and knowledge discovery in protein databases [J].
Radivojac, P ;
Chawla, NV ;
Dunker, AK ;
Obradovic, Z .
JOURNAL OF BIOMEDICAL INFORMATICS, 2004, 37 (04) :224-239
[38]   Text claslsification based on the TAN Model [J].
Shi, HB ;
Wang, ZH ;
Huang, HK ;
Jing, LP .
2002 IEEE REGION 10 CONFERENCE ON COMPUTERS, COMMUNICATIONS, CONTROL AND POWER ENGINEERING, VOLS I-III, PROCEEDINGS, 2002, :43-46
[39]   Epigenetic transgenerational actions of environmental factors in disease etiology [J].
Skinner, Michael K. ;
Manikkam, Mohan ;
Guerrero-Bosagna, Carlos .
TRENDS IN ENDOCRINOLOGY AND METABOLISM, 2010, 21 (04) :214-222
[40]  
Smit AFA, 1996, REPEATMASKER