A novel topic modeling based weighting framework for class imbalance learning

被引:1
|
作者
Santhiappan, Sudarsun [1 ]
Chelladurai, Jeshuren [1 ]
Ravindran, Balaraman [1 ,2 ]
机构
[1] IIT Madras, Dept Comp Sci & Engn, Chennai, Tamil Nadu, India
[2] IIT Madras, Robert Bosch Ctr Data Sci & AI RBC DSAI, Chennai, Tamil Nadu, India
关键词
Class imbalance learning; Data distribution estimation; Directed undersampling; Topic modeling; MACHINE; SMOTE;
D O I
10.1145/3152494.3152496
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
Classification of data with imbalance characteristics has become an important research problem, as data from most of the real-world applications follow non-uniform class distributions. A simple solution to handle class imbalance is by sampling from the dataset appropriately to compensate for the imbalance in class proportions. When the data distribution is unknown during sampling, making assumptions on the distribution requires domain knowledge and insights on the dataset. We propose a novel unsupervised topic modeling based weighting framework to estimate the latent data distribution of a dataset. We also propose TODUS, a topics oriented directed undersampling algorithm that follows the estimated data distribution to draw samples from the dataset. TODUS minimizes the loss of important information that typically gets dropped during random undersampling. We have shown empirically that the performance of TODUS method is better than the other sampling methods compared in our experiments.
引用
收藏
页码:20 / 29
页数:10
相关论文
共 50 条
  • [1] TOMBoost: a topic modeling based boosting approach for learning with class imbalance
    Santhiappan, Sudarsun
    Chelladurai, Jeshuren
    Ravindran, Balaraman
    INTERNATIONAL JOURNAL OF DATA SCIENCE AND ANALYTICS, 2024, 17 (04) : 389 - 409
  • [2] TOMBoost: a topic modeling based boosting approach for learning with class imbalance
    Sudarsun Santhiappan
    Jeshuren Chelladurai
    Balaraman Ravindran
    International Journal of Data Science and Analytics, 2024, 17 : 389 - 409
  • [3] A Learning Framework for Online Class Imbalance Learning
    Wang, Shuo
    Minku, Leandro L.
    Yao, Xin
    PROCEEDINGS OF THE 2013 IEEE SYMPOSIUM ON COMPUTATIONAL INTELLIGENCE AND ENSEMBLE LEARNING (CIEL), 2013, : 36 - 45
  • [4] A novel framework for class imbalance learning using intelligent under-sampling
    Naganjaneyulu S.
    Kuppa M.R.
    Naganjaneyulu, S. (svna2198@gmail.com), 1600, Springer Verlag (02): : 73 - 84
  • [5] An Ensemble Based Incremental Learning Framework for Concept Drift and Class Imbalance
    Ditzler, Gregory
    Polikar, Robi
    2010 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS IJCNN 2010, 2010,
  • [6] Term Weighting in Topic Modeling
    Tekin, Yasar
    2023 31ST SIGNAL PROCESSING AND COMMUNICATIONS APPLICATIONS CONFERENCE, SIU, 2023,
  • [7] A novel oversampling technique based on the manifold distance for class imbalance learning
    Guo, Yinan
    Jiao, Botao
    Yang, Lingkai
    Cheng, Jian
    Yang, Shengxiang
    Tang, Fengzhen
    INTERNATIONAL JOURNAL OF BIO-INSPIRED COMPUTATION, 2021, 18 (03) : 131 - 142
  • [8] Novel fuzzy clustering-based undersampling framework for class imbalance problem
    Vibha Pratap
    Amit Prakash Singh
    International Journal of System Assurance Engineering and Management, 2023, 14 : 967 - 976
  • [9] Novel fuzzy clustering-based undersampling framework for class imbalance problem
    Pratap, Vibha
    Singh, Amit Prakash
    INTERNATIONAL JOURNAL OF SYSTEM ASSURANCE ENGINEERING AND MANAGEMENT, 2023, 14 (03) : 967 - 976
  • [10] Graph-based term weighting scheme for topic modeling
    Bekoulis, Giannis
    Rousseau, Francois
    2016 IEEE 16TH INTERNATIONAL CONFERENCE ON DATA MINING WORKSHOPS (ICDMW), 2016, : 1039 - 1044