Studying the Effect of Class Imbalance in Ocean Turbine Fault Data on Reliable State Detection

被引:3
作者
Duhaney, Janell [1 ]
Khoshgoftaar, Taghi M. [1 ]
Napolitano, Amri [1 ]
机构
[1] Florida Atlantic Univ, Comp & Elect Engn & Comp Sci, Boca Raton, FL 33431 USA
来源
2012 11TH INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND APPLICATIONS (ICMLA 2012), VOL 1 | 2012年
关键词
ocean turbine; state detection; class imbalance; condition monitoring; CLASSIFICATION;
D O I
10.1109/ICMLA.2012.53
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Class imbalance is prevalent in many real world datasets. It occurs when there are significantly fewer examples in one or more classes in a dataset compared to the number of instances in the remaining classes. When trained on highly imbalanced datasets, traditional machine learning techniques can often simply ignore the minority class(es) and label all instances as being of the majority class to maximize accuracy. This problem has been studied in many domains but there is little or no research related to the effect of class imbalance in fault data for condition monitoring of an ocean turbine. This study makes the first efforts in bridging that gap by providing insight into how class imbalance in vibration data can impact a learner's ability to reliably identify changes in the ocean turbine's operational state. To do so, we empirically evaluate the performances of three popular, but very different, machine learning algorithms when trained on four datasets with varying class distributions (one balanced and three imbalanced) to distinguish between a normal and an abnormal state. All data used in this study were collected from the testbed for an ocean turbine and were undersampled to simulate the different levels of imbalance. We find here, as in other domains, that the three learners seemed to suffer overall when trained on data with a highly skewed class distribution (with 0.1% examples in a faulty/abnormal state while the remaining 99.9% were captured in a normal operational state). It was noted, however, that the Logistic Regression and Decision Tree classifiers performed better when only 5% of the total number of examples were representative of an abnormal state (the remaining 95% therefore indicating normal operation) than they did when there was no imbalance present.
引用
收藏
页码:268 / 275
页数:8
相关论文
共 21 条
[1]  
[Anonymous], 2007, 2 C ESP INF ZARZG
[2]  
[Anonymous], SDM
[3]  
[Anonymous], 2014, C4. 5: programs for machine learning
[4]  
[Anonymous], SPRINGER SERIES STAT, DOI DOI 10.1007/978-0-387-69395-8
[5]   SMOTE: Synthetic minority over-sampling technique [J].
Chawla, Nitesh V. ;
Bowyer, Kevin W. ;
Hall, Lawrence O. ;
Kegelmeyer, W. Philip .
2002, American Association for Artificial Intelligence (16)
[6]   On the optimality of the simple Bayesian classifier under zero-one loss [J].
Domingos, P ;
Pazzani, M .
MACHINE LEARNING, 1997, 29 (2-3) :103-130
[7]  
Duhaney J., 2010, P 16 ISSAT INT REL Q, P308
[8]   Naive Bayes for regression [J].
Frank, E ;
Trigg, L ;
Holmes, G ;
Witten, IH .
MACHINE LEARNING, 2000, 41 (01) :5-25
[9]  
Gao K., 2012, SEKE, P74
[10]   On the Class Imbalance Problem [J].
Guo, Xinjian ;
Yin, Yilong ;
Dong, Cailing ;
Yang, Gongping ;
Zhou, Guangtong .
ICNC 2008: FOURTH INTERNATIONAL CONFERENCE ON NATURAL COMPUTATION, VOL 4, PROCEEDINGS, 2008, :192-201