Classification of Wine Quality with Imbalanced Data

被引:0
作者
Hu, Gongzhu [1 ]
Xi, Tan [1 ]
Mohammed, Faraz [1 ]
Miao, Huaikou [2 ]
机构
[1] Cent Michigan Univ, Dept Comp Sci, Mt Pleasant, MI 48859 USA
[2] Shanghai Univ, Sch Comp Engn & Sci, Shanghai 200041, Peoples R China
来源
PROCEEDINGS 2016 IEEE INTERNATIONAL CONFERENCE ON INDUSTRIAL TECHNOLOGY (ICIT) | 2016年
关键词
classification; imbalanced data; SMOTE; wine quality;
D O I
暂无
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
We propose a data analysis approach to classify wine into different quality categories. A data set of white wines of 4898 observations obtained from the Minho region in Portugal was used in our analysis. As the occurrence of events in the data set was imbalanced with about 93% of the observations are from one category, we applied the Synthetic Minority Over-Sampling Technique (SMOTE) to over sample the minority class. The balanced data was used to model a classifier that categorizes a wine into three categories as high quality, normal quality, and poor quality. Three different classification techniques were used: decision tree, adaptive boosting (AdaBoost), and random forest. Our experiments show that the random forest technique seems to produce the desired results with the least percentage of error.
引用
收藏
页码:1712 / 1717
页数:6
相关论文
共 50 条
  • [31] Applying Map-Reduce to Imbalanced Data Classification
    Jedrzejowicz, Joanna
    Neumann, Jakub
    Synowczyk, Piotr
    Zakrzewska, Magdalena
    [J]. 2017 IEEE INTERNATIONAL CONFERENCE ON INNOVATIONS IN INTELLIGENT SYSTEMS AND APPLICATIONS (INISTA), 2017, : 29 - 33
  • [32] Adaptive Fusion Based Method for Imbalanced Data Classification
    Liang, Zefeng
    Wang, Huan
    Yang, Kaixiang
    Shi, Yifan
    [J]. FRONTIERS IN NEUROROBOTICS, 2022, 16
  • [33] Noise-robust oversampling for imbalanced data classification
    Liu, Yongxu
    Liu, Yan
    Yu, Bruce X. B.
    Zhong, Shenghua
    Hu, Zhejing
    [J]. PATTERN RECOGNITION, 2023, 133
  • [34] An Improved Extreme Learning Machine for Imbalanced Data Classification
    Zhang, Xiaopeng
    Qin, Liangxi
    [J]. IEEE ACCESS, 2022, 10 : 8634 - 8642
  • [35] A review on classification of imbalanced data for wireless sensor networks
    Patel, Harshita
    Rajput, Dharmendra Singh
    Reddy, G. Thippa
    Iwendi, Celestine
    Bashir, Ali Kashif
    Jo, Ohyun
    [J]. INTERNATIONAL JOURNAL OF DISTRIBUTED SENSOR NETWORKS, 2020, 16 (04)
  • [36] Data reduction and stacking for imbalanced data classification
    Czarnowski, Ireneusz
    Jedrzejowicz, Piotr
    [J]. JOURNAL OF INTELLIGENT & FUZZY SYSTEMS, 2019, 37 (06) : 7239 - 7249
  • [37] Imbalanced Data Classification Based on Feature Selection Techniques
    Ksieniewicz, Pawel
    Wozniak, Michal
    [J]. INTELLIGENT DATA ENGINEERING AND AUTOMATED LEARNING (IDEAL 2018), PT II, 2018, 11315 : 296 - 303
  • [38] Aided Selection of Sampling Methods for Imbalanced Data Classification
    Sahni, Deep
    Pappu, Satya Jayadev
    Bhatt, Nirav
    [J]. CODS-COMAD 2021: PROCEEDINGS OF THE 3RD ACM INDIA JOINT INTERNATIONAL CONFERENCE ON DATA SCIENCE & MANAGEMENT OF DATA (8TH ACM IKDD CODS & 26TH COMAD), 2021, : 198 - 202
  • [39] A GEV-Based Classification Algorithm for Imbalanced Data
    Fu J.
    Liu G.
    [J]. Jisuanji Yanjiu yu Fazhan/Computer Research and Development, 2018, 55 (11): : 2361 - 2371
  • [40] The prior probability in the batch classification of imbalanced data streams
    Ksieniewicz, Pawel
    [J]. NEUROCOMPUTING, 2021, 452 : 309 - 316