TOMBoost: a topic modeling based boosting approach for learning with class imbalance

被引:0
|
作者
Sudarsun Santhiappan
Jeshuren Chelladurai
Balaraman Ravindran
机构
[1] Indian Institute of Technology Madras,Department of Computer Science and Engineering
[2] Robert Bosch Centre for Data Science and AI (RBC-DSAI),undefined
来源
International Journal of Data Science and Analytics | 2024年 / 17卷
关键词
Boosting; Class imbalance learning; Data space weighting; Topic modeling; Topic posterior; Topic simplex; Weighting framework;
D O I
暂无
中图分类号
学科分类号
摘要
Classification of data with imbalanced characteristics is an essential research problem as the data from most real-world applications follow non-uniform class proportions. Solutions to handle class imbalance depend on how important one data point is versus the other. Directed data sampling and data-level cost-sensitive methods use the data point importance information to sample from the dataset such that the essential data points are retained and possibly oversampled. In this paper, we propose a novel topic modeling-based weighting framework to assign importance to the data points in an imbalanced dataset based on the topic posterior probabilities estimated using the latent Dirichlet allocation and probabilistic latent semantic analysis models. We also propose TOMBoost, a topic modeled boosting scheme based on the weighting framework, particularly tuned for learning with class imbalance. In an empirical study spanning 40 datasets, we show that TOMBoost wins or ties with 37 datasets on an average against other boosting and sampling methods. We also empirically show that TOMBoost minimizes the model bias faster than the other popular boosting methods for class imbalance learning.
引用
收藏
页码:389 / 409
页数:20
相关论文
共 50 条
  • [1] TOMBoost: a topic modeling based boosting approach for learning with class imbalance
    Santhiappan, Sudarsun
    Chelladurai, Jeshuren
    Ravindran, Balaraman
    INTERNATIONAL JOURNAL OF DATA SCIENCE AND ANALYTICS, 2024, 17 (04) : 389 - 409
  • [2] A novel topic modeling based weighting framework for class imbalance learning
    Santhiappan, Sudarsun
    Chelladurai, Jeshuren
    Ravindran, Balaraman
    PROCEEDINGS OF THE ACM INDIA JOINT INTERNATIONAL CONFERENCE ON DATA SCIENCE AND MANAGEMENT OF DATA (CODS-COMAD'18), 2018, : 20 - 29
  • [3] A Boosting based Adaptive Oversampling Technique for Treatment of Class Imbalance
    Devi, Debashree
    Biswas, Saroj K.
    Purkayastha, Biswajit
    2019 INTERNATIONAL CONFERENCE ON COMPUTER COMMUNICATION AND INFORMATICS (ICCCI - 2019), 2019,
  • [4] Class-specific cost-sensitive boosting weighted ELM for class imbalance learning
    Raghuwanshi, Bhagat Singh
    Shukla, Sanyam
    MEMETIC COMPUTING, 2019, 11 (03) : 263 - 283
  • [5] Class-specific cost-sensitive boosting weighted ELM for class imbalance learning
    Bhagat Singh Raghuwanshi
    Sanyam Shukla
    Memetic Computing, 2019, 11 : 263 - 283
  • [6] Graph Clustering based Topic Modeling using Feature Learning Approach
    Ganguli, Isha
    Sil, Jaya
    PROCEEDINGS OF THE WORKSHOP PROGRAM OF THE 19TH INTERNATIONAL CONFERENCE ON DISTRIBUTED COMPUTING AND NETWORKING (ICDCN'18), 2018,
  • [7] GOBoost: G-mean Optimized Boosting Framework for Class Imbalance Learning
    Lu, Yang
    Cheung, Yiu-ming
    Tang, Yuan Yan
    PROCEEDINGS OF THE 2016 12TH WORLD CONGRESS ON INTELLIGENT CONTROL AND AUTOMATION (WCICA), 2016, : 3149 - 3154
  • [8] A Boosting-Aided Adaptive Cluster-Based Undersampling Approach for Treatment of Class Imbalance Problem
    Devi, Debashree
    Namasudra, Suyel
    Kadry, Seifedine
    INTERNATIONAL JOURNAL OF DATA WAREHOUSING AND MINING, 2020, 16 (03) : 60 - 86
  • [9] Boosting Software Fault Prediction: Addressing Class Imbalance With Enhanced Ensemble Learning
    Alsorory, Hanan Sharif
    Alshraideh, Mohammad
    APPLIED COMPUTATIONAL INTELLIGENCE AND SOFT COMPUTING, 2024, 2024
  • [10] Machine learning in finance: A topic modeling approach
    Aziz, Saqib
    Dowling, Michael
    Hammami, Helmi
    Piepenbrink, Anke
    EUROPEAN FINANCIAL MANAGEMENT, 2022, 28 (03) : 744 - 770