Improving Instance Selection Methods for Big Data Classification

被引:0
|
作者
Malhat, Mohamed [1 ]
El Menshawy, Mohamed [1 ]
Mousa, Hamdy [1 ]
El Sisi, Ashraf [1 ]
机构
[1] Menoufia Univ, Fac Comp & Informat, Comp Sci Dept, Shibin Al Kawm, Egypt
来源
2017 13TH INTERNATIONAL COMPUTER ENGINEERING CONFERENCE (ICENCO) | 2017年
关键词
Big data; Data Mining; Data Reduction; Instance Selection; REDUCTION;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The explosion of data in many application domains leads to a new term called big data. While the big data volume rapidly exceeds, the capacity and processing capabilities of contributed data mining algorithms are not effective. The instance selection methods become a mandatory step prior to applying data mining algorithms. Instance selection methods scale training set down by eliminating redundant, erroneous, and unrelated instances. Recently, instance selection methods have improved to work on big data sets by splitting training data into disjoint subsets and applying instance selection methods on individual subsets. However, these improved methods have a variable performance in the degree of reduction rate and classification accuracy. In this work, we propose an operational and unified framework to balance between reduction rate and classification accuracy. It starts with splitting a training set into class-balanced subsets to analyze the impact of the splitting process on the performance regarding the reduction rate and classification accuracy. It then applies two different instance selection methods on each subset. We implement and test experimentally the framework using two standard data sets. With the random splitting process as a benchmark, the results prove that the class-balanced splitting process is preferred regarding the classification accuracy criterion. The results also depict that the combination of two instance selection methods remarkably reduces the performance variability.
引用
收藏
页码:213 / 218
页数:6
相关论文
共 50 条
  • [31] Ranking-based instance selection for pattern classification
    Cavalcanti, George D. C.
    Soares, Rodolfo J. O.
    EXPERT SYSTEMS WITH APPLICATIONS, 2020, 150
  • [32] Feature Selection and Its Use in Big Data: Challenges, Methods, and Trends
    Rong, Miao
    Gong, Dunwei
    Gao, Xiaozhi
    IEEE ACCESS, 2019, 7 : 19709 - 19725
  • [33] Evolutionary instance selection for text classification
    Tsai, Chih-Fong
    Chen, Zong-Yao
    Ke, Shih-Wen
    JOURNAL OF SYSTEMS AND SOFTWARE, 2014, 90 : 104 - 113
  • [34] A Comprehensive Analysis of Classification Methods for Big Data Stream
    Kaur, Amrinder
    Kumar, Rakesh
    ADVANCES IN COMPUTING AND INTELLIGENT SYSTEMS, ICACM 2019, 2020, : 213 - 222
  • [35] Instance selection for time series classification based on immune binary particle swarm optimization
    Zhai, Tingting
    He, Zhenfeng
    KNOWLEDGE-BASED SYSTEMS, 2013, 49 : 106 - 115
  • [36] Instance Selection via Voronoi Neighbors for Binary Classification Tasks
    Fu, Ying
    Liu, Kaibo
    Zhu, Wenbin
    IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2024, 36 (08) : 3921 - 3933
  • [37] Feature Selection and Classification of Big Data Using MapReduce Framework
    Devi, D. Renuka
    Sasikala, S.
    INTELLIGENT COMPUTING, INFORMATION AND CONTROL SYSTEMS, ICICCS 2019, 2020, 1039 : 666 - 673
  • [38] An immune-inspired instance selection mechanism for supervised classification
    Grazziela P. Figueredo
    Nelson F. F. Ebecken
    Douglas A. Augusto
    Helio J. C. Barbosa
    Memetic Computing, 2012, 4 : 135 - 147
  • [39] Data Feature Selection Methods on Distributed Big Data Processing Platforms
    Catalkaya, Mehmet Burak
    Kalipsiz, Oya
    Aktas, Mehmet S.
    Turgut, Umut Orcun
    2018 3RD INTERNATIONAL CONFERENCE ON COMPUTER SCIENCE AND ENGINEERING (UBMK), 2018, : 133 - 138
  • [40] Instance and Feature Selection Using Fuzzy Rough Sets: A Bi-Selection Approach for Data Reduction
    Zhang, Xiao
    Mei, Changlin
    Li, Jinhai
    Yang, Yanyan
    Qian, Ting
    IEEE TRANSACTIONS ON FUZZY SYSTEMS, 2023, 31 (06) : 1981 - 1994