New Fitness Functions in Genetic Programming for Classification with High-dimensional Unbalanced Data

被引:0
|
作者
Pei, Wenbin [1 ]
Xue, Bing [1 ]
Shang, Lin [2 ]
Zhang, Mengjie [1 ]
机构
[1] Victoria Univ Wellington, Sch Engn & Comp Sci, POB 600, Wellington 6140, New Zealand
[2] Nanjing Univ, State Key Lab Novel Software Technol, Nanjing 210093, Jiangsu, Peoples R China
关键词
Classification; Genetic Programming; Fitness Functions; High-dimensionality; Class Imbalance; FEATURE-SELECTION;
D O I
10.1109/cec.2019.8789974
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
High-dimensionality and class imbalance represent two main challenges in classification. Recently, there is a growing number of datasets exhibiting the characteristics of the combination of the class imbalance and high-dimensionality. Genetic programming (GP) has been successfully applied to solve high-dimensional classification tasks. However, most existing GP methods may also suffer from a performance bias if the class distribution is unbalanced. Using fitness functions for cost adjustment is one of the most important methods in GP to address the class imbalance issue. This paper develops new fitness functions in GP to address the class imbalance issue in classification with high-dimensional unbalanced data. Two fitness functions are proposed to increase the performance of the traditional accuracy measures, and one fitness function is proposed to approximate Area Under Curve (AUC) with the goal to save the training time. Experiments on six high-dimensional unbalanced datasets show the better performance of the proposed fitness functions, compared to existing fitness functions.
引用
收藏
页码:2779 / 2786
页数:8
相关论文
共 50 条
  • [21] Genetic Programming for Image Classification with Unbalanced Data
    Bhowan, Urvesh
    Zhang, Mengjie
    Johnston, Mark
    2009 24TH INTERNATIONAL CONFERENCE IMAGE AND VISION COMPUTING NEW ZEALAND (IVCNZ 2009), 2009, : 316 - +
  • [22] A new fitness function in genetic programming for classification of imbalanced data
    Kumar, Arvind
    JOURNAL OF EXPERIMENTAL & THEORETICAL ARTIFICIAL INTELLIGENCE, 2024, 36 (07) : 1021 - 1033
  • [23] Developing Interval-Based Cost-Sensitive Classifiers by Genetic Programming for Binary High-Dimensional Unbalanced Classification
    Pei, Wenbin
    Xue, Bing
    Shang, Lin
    Zhang, Mengjie
    IEEE COMPUTATIONAL INTELLIGENCE MAGAZINE, 2021, 16 (01) : 84 - 98
  • [24] Genetic Programming for Feature Selection and Construction to High-Dimensional Data
    Ma, Jianbin
    Zhu, Man
    2024 4TH INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND INTELLIGENT SYSTEMS ENGINEERING, MLISE 2024, 2024, : 196 - 200
  • [25] Genetic programming for multiple-feature construction on high-dimensional classification
    Binh Tran
    Xue, Bing
    Zhang, Mengjie
    PATTERN RECOGNITION, 2019, 93 : 404 - 417
  • [26] Sampling Methods in Genetic Programming for Classification with Unbalanced Data
    Hunt, Rachel
    Johnston, Mark
    Browne, Will
    Zhang, Mengjie
    AI 2010: ADVANCES IN ARTIFICIAL INTELLIGENCE, 2010, 6464 : 273 - +
  • [27] A Comparison of Classification Strategies in Genetic Programming with Unbalanced Data
    Bhowan, Urvesh
    Zhang, Mengjie
    Johnston, Mark
    AI 2010: ADVANCES IN ARTIFICIAL INTELLIGENCE, 2010, 6464 : 243 - +
  • [28] Multiple Bayesian discriminant functions for high-dimensional massive data classification
    Zhang, Jianfei
    Wang, Shengrui
    Chen, Lifei
    Gallinari, Patrick
    DATA MINING AND KNOWLEDGE DISCOVERY, 2017, 31 (02) : 465 - 501
  • [29] Multiple Bayesian discriminant functions for high-dimensional massive data classification
    Jianfei Zhang
    Shengrui Wang
    Lifei Chen
    Patrick Gallinari
    Data Mining and Knowledge Discovery, 2017, 31 : 465 - 501
  • [30] Two Steps Genetic Programming for Big Data Perspective of Distributed and High-Dimensional Data
    Huang, Jih-Jeng
    2015 IEEE INTERNATIONAL CONGRESS ON BIG DATA - BIGDATA CONGRESS 2015, 2015, : 753 - 756