Classification of Wine Quality with Imbalanced Data

被引:0
|
作者
Hu, Gongzhu [1 ]
Xi, Tan [1 ]
Mohammed, Faraz [1 ]
Miao, Huaikou [2 ]
机构
[1] Cent Michigan Univ, Dept Comp Sci, Mt Pleasant, MI 48859 USA
[2] Shanghai Univ, Sch Comp Engn & Sci, Shanghai 200041, Peoples R China
来源
PROCEEDINGS 2016 IEEE INTERNATIONAL CONFERENCE ON INDUSTRIAL TECHNOLOGY (ICIT) | 2016年
关键词
classification; imbalanced data; SMOTE; wine quality;
D O I
暂无
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
We propose a data analysis approach to classify wine into different quality categories. A data set of white wines of 4898 observations obtained from the Minho region in Portugal was used in our analysis. As the occurrence of events in the data set was imbalanced with about 93% of the observations are from one category, we applied the Synthetic Minority Over-Sampling Technique (SMOTE) to over sample the minority class. The balanced data was used to model a classifier that categorizes a wine into three categories as high quality, normal quality, and poor quality. Three different classification techniques were used: decision tree, adaptive boosting (AdaBoost), and random forest. Our experiments show that the random forest technique seems to produce the desired results with the least percentage of error.
引用
收藏
页码:1712 / 1717
页数:6
相关论文
共 50 条
  • [1] Software quality classification with imbalanced and noisy data
    Folleco, Andres
    Khoshgoftaar, Taghi M.
    Van Hulse, Jason
    THIRTEENTH ISSAT INTERNATIONAL CONFERENCE ON RELIABILITY AND QUALITY IN DESIGN, PROCEEDINGS, 2007, : 191 - +
  • [2] Application of Imbalanced Data Classification Quality Metrics as Weighting Methods of the Ensemble Data Stream Classification Algorithms
    Wegier, Weronika
    Ksieniewicz, Pawel
    ENTROPY, 2020, 22 (08)
  • [3] Binary Classification with Imbalanced Data
    Chiang, Jyun-You
    Lio, Yuhlong
    Hsu, Chien-Ya
    Ho, Chia-Ling
    Tsai, Tzong-Ru
    ENTROPY, 2024, 26 (01)
  • [4] Framework for imbalanced data classification
    Blaszczyk, Mikolaj
    Jedrzejowicz, Joanna
    KNOWLEDGE-BASED AND INTELLIGENT INFORMATION & ENGINEERING SYSTEMS (KSE 2021), 2021, 192 : 3477 - 3486
  • [5] Mine Classification With Imbalanced Data
    Williams, David P.
    Myers, Vincent
    Silvious, Miranda Schatten
    IEEE GEOSCIENCE AND REMOTE SENSING LETTERS, 2009, 6 (03) : 528 - 532
  • [6] Protein classification with imbalanced data
    Zhao, Xing-Ming
    Li, Xin
    Chen, Luonan
    Aihara, Kazuyuki
    PROTEINS-STRUCTURE FUNCTION AND BIOINFORMATICS, 2008, 70 (04) : 1125 - 1132
  • [7] CLASSIFICATION OF IMBALANCED DATA: A REVIEW
    Sun, Yanmin
    Wong, Andrew K. C.
    Kamel, Mohamed S.
    INTERNATIONAL JOURNAL OF PATTERN RECOGNITION AND ARTIFICIAL INTELLIGENCE, 2009, 23 (04) : 687 - 719
  • [8] Data reduction and stacking for imbalanced data classification
    Czarnowski, Ireneusz
    Jedrzejowicz, Piotr
    JOURNAL OF INTELLIGENT & FUZZY SYSTEMS, 2019, 37 (06) : 7239 - 7249
  • [9] An empirical study of the classification performance of learners on imbalanced and noisy software quality data
    Seiffert, Chris
    Khoshgoftaar, Taghi M.
    Van Hulse, Jason
    Folleco, Andres
    IRI 2007: PROCEEDINGS OF THE 2007 IEEE INTERNATIONAL CONFERENCE ON INFORMATION REUSE AND INTEGRATION, 2007, : 651 - +
  • [10] Classification of Imbalanced Auction Fraud Data
    Ganguly, Swati
    Sadaoui, Samira
    ADVANCES IN ARTIFICIAL INTELLIGENCE, CANADIAN AI 2017, 2017, 10233 : 84 - 89