Stacking density estimation and its oversampling method for continuously imbalanced data in chemometrics

被引:0
|
作者
Zhao, Xin-Ru [1 ]
Yi, Lun-Zhao [2 ]
Fu, Guang-Hui [1 ]
机构
[1] Kunming Univ Sci & Technol, Sch Sci, Kunming 650500, Peoples R China
[2] Kunming Univ Sci & Technol, Fac Food Sci & Engn, Kunming 650500, Peoples R China
基金
中国国家自然科学基金;
关键词
Imbalanced regression; Density estimation; Stacking; Oversampling; Rare value prediction; CLASSIFICATION; REGRESSION;
D O I
10.1016/j.chemolab.2025.105366
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Continuously imbalanced data means that the target variable is continuous and its distribution This kind of data is widespread in many practical application areas. However, methods to handle continuously imbalanced data have been relatively scarce, and there is an urgent need corresponding imbalance regression methods to enhance the capability of handling continuously data. Firstly, we propose a Stacking-based density estimation (SDE) method to solve the density problem of continuously imbalanced target variables. SDE links density estimation with the Ensemble algorithm called Stacking, and its core concept is the "fusion of multiple perspectives for accurate Performing SDE enhances the model's understanding of complex data structures and makes it more and accurate in identifying rare values. Subsequently, we investigate an SDE-based oversampling (SDE-OS). SDE-OS uses SDE to synthesize new rare instances in the rare-value region, achieving customization of rare-value additions. In a series of numerical experiments, SDE has been estimated accurately than the kernel density estimation method on ANLL. SDE-OS outperforms conventional methods such as SMOGN and SMOTER in various metrics. Therefore, the proposed SDE and SDE-OS competitive and effective tools for addressing the imbalanced regression problem.
引用
收藏
页数:23
相关论文
共 50 条
  • [41] A hybrid sampling algorithm for imbalanced and class-overlap data based on natural neighbors and density estimation
    Li, Xinqi
    Liu, Qicheng
    KNOWLEDGE AND INFORMATION SYSTEMS, 2025, 67 (03) : 2259 - 2290
  • [42] A novel classification method for class-imbalanced data and its application in microRNA recognition
    Geng X.
    Zhu Y.-Q.
    Yang Z.
    International Journal Bioautomation, 2018, 22 (02) : 133 - 146
  • [43] A data mining method for imbalanced datasets based on one-sided link and distribution density of instances
    Zhai, Yun
    Wang, Shu-Peng
    Ma, Nan
    Yang, Bing-Ru
    Zhang, De-Zheng
    Tien Tzu Hsueh Pao/Acta Electronica Sinica, 2014, 42 (07): : 1311 - 1319
  • [45] Functional data: local linear estimation of the conditional density and its application
    Demongeot, Jacques
    Laksaci, Ali
    Madani, Fethi
    Rachdi, Mustapha
    STATISTICS, 2013, 47 (01) : 26 - 44
  • [46] A cross-validation method for data with ties in kernel density estimation
    Kamila Żychaluk
    Prakash N. Patil
    Annals of the Institute of Statistical Mathematics, 2008, 60 : 21 - 44
  • [47] Establishment of estimation lightning density method with lightning location system data
    Suzuki, M
    Katagiri, N
    Ishikawa, K
    IEEE POWER ENGINEERING SOCIETY - 1999 WINTER MEETING, VOLS 1 AND 2, 1999, : 1322 - 1326
  • [48] A cross-validation method for data with ties in kernel density estimation
    Zychaluk, Kamila
    Patil, Prakash N.
    ANNALS OF THE INSTITUTE OF STATISTICAL MATHEMATICS, 2008, 60 (01) : 21 - 44
  • [49] Divisive Latent Class Modeling as a Density Estimation Method for Categorical Data
    van der Palm, Daniel W.
    van der Ark, L. Andries
    Vermunt, Jeroen K.
    JOURNAL OF CLASSIFICATION, 2016, 33 (01) : 52 - 72
  • [50] An efficient RFID data cleaning method based on wavelet density estimation
    Liu, Yaozong
    Zhang, Hong
    Han, Fawang
    Tan, Jun
    Journal of Digital Information Management, 2015, 13 (01): : 10 - 14