Classification of Wine Quality with Imbalanced Data

被引:0
|
作者
Hu, Gongzhu [1 ]
Xi, Tan [1 ]
Mohammed, Faraz [1 ]
Miao, Huaikou [2 ]
机构
[1] Cent Michigan Univ, Dept Comp Sci, Mt Pleasant, MI 48859 USA
[2] Shanghai Univ, Sch Comp Engn & Sci, Shanghai 200041, Peoples R China
来源
PROCEEDINGS 2016 IEEE INTERNATIONAL CONFERENCE ON INDUSTRIAL TECHNOLOGY (ICIT) | 2016年
关键词
classification; imbalanced data; SMOTE; wine quality;
D O I
暂无
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
We propose a data analysis approach to classify wine into different quality categories. A data set of white wines of 4898 observations obtained from the Minho region in Portugal was used in our analysis. As the occurrence of events in the data set was imbalanced with about 93% of the observations are from one category, we applied the Synthetic Minority Over-Sampling Technique (SMOTE) to over sample the minority class. The balanced data was used to model a classifier that categorizes a wine into three categories as high quality, normal quality, and poor quality. Three different classification techniques were used: decision tree, adaptive boosting (AdaBoost), and random forest. Our experiments show that the random forest technique seems to produce the desired results with the least percentage of error.
引用
收藏
页码:1712 / 1717
页数:6
相关论文
共 50 条
  • [21] Weighted Data Gravitation Classification for Standard and Imbalanced Data
    Cano, Alberto
    Zafra, Amelia
    Ventura, Sebastian
    IEEE TRANSACTIONS ON CYBERNETICS, 2013, 43 (06) : 1672 - 1687
  • [22] Product Processing Quality Classification Model for Small-Sample and Imbalanced Data Environment
    Liu, Feixiang
    Dai, Yiru
    COMPUTATIONAL INTELLIGENCE AND NEUROSCIENCE, 2022, 2022
  • [23] An automated approach for binary classification on imbalanced data
    Vieira, Pedro Marques
    Rodrigues, Fatima
    KNOWLEDGE AND INFORMATION SYSTEMS, 2024, 66 (05) : 2747 - 2767
  • [24] MaMiPot: a paradigm shift for the classification of imbalanced data
    Zefrehi, Hossein Ghaderi
    Altincay, Hakan
    JOURNAL OF INTELLIGENT INFORMATION SYSTEMS, 2023, 61 (01) : 299 - 324
  • [25] Discriminative feature generation for classification of imbalanced data
    Suh, Sungho
    Lukowicz, Paul
    Lee, Yong Oh
    PATTERN RECOGNITION, 2022, 122
  • [26] Imbalanced Data Stream Classification: Analysis and Solution
    Anjana, Koringa
    Radhika, Kotecha
    Darshana, Patel
    INFORMATION AND COMMUNICATION TECHNOLOGY FOR INTELLIGENT SYSTEMS (ICTIS 2017) - VOL 2, 2018, 84 : 316 - 324
  • [27] Classification of imbalanced data with a geometric digraph family
    Manukyan, Artür
    Ceyhan, Elvan
    Journal of Machine Learning Research, 2016, 17
  • [28] Imbalanced Data Classification Based on Hybrid Methods
    Zhang, Nai-Nan
    Ye, Shao-Zhen
    Chien, Ting-Ying
    PROCEEDINGS OF THE 2018 2ND INTERNATIONAL CONFERENCE ON BIG DATA RESEARCH (ICBDR 2018), 2018, : 16 - 20
  • [29] MaMiPot: a paradigm shift for the classification of imbalanced data
    Hossein Ghaderi Zefrehi
    Hakan Altınçay
    Journal of Intelligent Information Systems, 2023, 61 : 299 - 324
  • [30] Classification performance assessment for imbalanced multiclass data
    Aguilar-Ruiz, Jesus S.
    Michalak, Marcin
    SCIENTIFIC REPORTS, 2024, 14 (01):