An Analysis of Several Machine Learning Algorithms for Imbalanced Classes

被引:0
作者
Datta, Soma [1 ]
Arputharaj, Anuprabha [1 ]
机构
[1] Univ Houston Clear Lake, Software Engn, Houston, TX 77058 USA
来源
2018 5TH INTERNATIONAL CONFERENCE ON SOFT COMPUTING & MACHINE INTELLIGENCE (ISCMI) | 2018年
关键词
Imbalance class; Machine learning; Datasets; Classification; Algorithms; Precision; Recall; Decision Trees; Association Mining;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Imbalanced data typically refers to classification problems when classes are not represented equally. In real world applications, most classification datasets do not have exactly equal number of instances and the problem arises where the class is imbalanced. Most Machine learning algorithm works best when the number of instances of each class are roughly equal but there are only specific algorithms to deal with the imbalanced classes. This survey research is mainly used to assess the thoughts and opinions of several authors regarding the imbalanced class in data mining. This survey focuses on information from multiple datasets and it aims to obtain several perspectives about imbalanced class. The survey is implemented by analysing detailed reports on several datasets from the UCI-Machine Learning Repository which is the centre for Machine Learning and Intelligent Systems. This study includes 20 datasets and their methodologies from 26 articles to do a comparative study on imbalanced class problems from fuzzy classification, decision trees, association mining, and ensemble methods. Class imbalance problem is extremely common in practice and is observed in various disciplines including medical diagnosis, fraud detection, anomaly detection, oil spillage detection, facial recognition, etc. However, this problem affects machine learning due to having disproportionate number of class instances in practice and due to its prevalence, several approaches are studied to deal with this problem. This study aims to exhibit one such approach for handling different datasets.
引用
收藏
页码:22 / 27
页数:6
相关论文
共 35 条
  • [1] Using emerging patterns and decision trees in rare-class classification
    Alhammady, H
    Ramamohanarao, K
    [J]. FOURTH IEEE INTERNATIONAL CONFERENCE ON DATA MINING, PROCEEDINGS, 2004, : 315 - 318
  • [2] [Anonymous], 2014, C4. 5: programs for machine learning
  • [3] Bhat U. Y., 2014, Int. J. Comput. Appl., V107, P1, DOI [10.5120/18848-9893, DOI 10.5120/18848-9893]
  • [4] Chai S., 2007, IEEE International Conference on Service Systems and Service Management, P1, DOI DOI 10.1109/ICSSSM.2007.4280173
  • [5] Datta S., 2015, J COMPUTING SCI COLL, V31, P65
  • [6] Datta S., 2018, SOFT COMPUTING
  • [7] Drummond C., EXPLICITLY REPRESENT, P10
  • [8] Elazmeh W., EVALUATING MISCLASSI
  • [9] Estabrooks A., 2001, Advances in Intelligent Data Analysis. 4th International Conference, IDA 2001. Proceedings (Lecture Notes in Computer Science Vol.2189), P34
  • [10] Applications of Geographic Information System in Airfield Infrastructure System Management and Maintenance
    Ho, Chun-Hsing
    Romero, Pedro
    [J]. TRANSPORTATION RESEARCH RECORD, 2008, 2052 (2052) : 100 - 109