Stacking density estimation and its oversampling method for continuously imbalanced data in chemometrics

被引：0

作者：

Zhao, Xin-Ru ^{[1
]}

Yi, Lun-Zhao ^{[2
]}

Fu, Guang-Hui ^{[1
]}

机构：

[1] Kunming Univ Sci & Technol, Sch Sci, Kunming 650500, Peoples R China

[2] Kunming Univ Sci & Technol, Fac Food Sci & Engn, Kunming 650500, Peoples R China

来源：

CHEMOMETRICS AND INTELLIGENT LABORATORY SYSTEMS | 2025年 / 261卷

基金：

中国国家自然科学基金;

关键词：

Imbalanced regression; Density estimation; Stacking; Oversampling; Rare value prediction; CLASSIFICATION; REGRESSION;

D O I：

10.1016/j.chemolab.2025.105366

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Continuously imbalanced data means that the target variable is continuous and its distribution This kind of data is widespread in many practical application areas. However, methods to handle continuously imbalanced data have been relatively scarce, and there is an urgent need corresponding imbalance regression methods to enhance the capability of handling continuously data. Firstly, we propose a Stacking-based density estimation (SDE) method to solve the density problem of continuously imbalanced target variables. SDE links density estimation with the Ensemble algorithm called Stacking, and its core concept is the "fusion of multiple perspectives for accurate Performing SDE enhances the model's understanding of complex data structures and makes it more and accurate in identifying rare values. Subsequently, we investigate an SDE-based oversampling (SDE-OS). SDE-OS uses SDE to synthesize new rare instances in the rare-value region, achieving customization of rare-value additions. In a series of numerical experiments, SDE has been estimated accurately than the kernel density estimation method on ANLL. SDE-OS outperforms conventional methods such as SMOGN and SMOTER in various metrics. Therefore, the proposed SDE and SDE-OS competitive and effective tools for addressing the imbalanced regression problem.

引用

页数：23

共 50 条

[41] A hybrid sampling algorithm for imbalanced and class-overlap data based on natural neighbors and density estimation
Li, Xinqi
Liu, Qicheng
KNOWLEDGE AND INFORMATION SYSTEMS, 2025, 67 (03) : 2259 - 2290
[42] A novel classification method for class-imbalanced data and its application in microRNA recognition
Geng X.
Zhu Y.-Q.
Yang Z.
International Journal Bioautomation, 2018, 22 (02) : 133 - 146
[43] A data mining method for imbalanced datasets based on one-sided link and distribution density of instances
Zhai, Yun
Wang, Shu-Peng
Ma, Nan
Yang, Bing-Ru
Zhang, De-Zheng
Tien Tzu Hsueh Pao/Acta Electronica Sinica, 2014, 42 (07): : 1311 - 1319
[44] Density weighted region growing method for imbalanced data SVM classification in under-sampling approaches
Wang, Dongling, 1600, Binary Information Press (11):
[45] Functional data: local linear estimation of the conditional density and its application
Demongeot, Jacques
Laksaci, Ali
Madani, Fethi
Rachdi, Mustapha
STATISTICS, 2013, 47 (01) : 26 - 44
[46] A cross-validation method for data with ties in kernel density estimation
Kamila Żychaluk
Prakash N. Patil
Annals of the Institute of Statistical Mathematics, 2008, 60 : 21 - 44
[47] Establishment of estimation lightning density method with lightning location system data
Suzuki, M
Katagiri, N
Ishikawa, K
IEEE POWER ENGINEERING SOCIETY - 1999 WINTER MEETING, VOLS 1 AND 2, 1999, : 1322 - 1326
[48] A cross-validation method for data with ties in kernel density estimation
Zychaluk, Kamila
Patil, Prakash N.
ANNALS OF THE INSTITUTE OF STATISTICAL MATHEMATICS, 2008, 60 (01) : 21 - 44
[49] Divisive Latent Class Modeling as a Density Estimation Method for Categorical Data
van der Palm, Daniel W.
van der Ark, L. Andries
Vermunt, Jeroen K.
JOURNAL OF CLASSIFICATION, 2016, 33 (01) : 52 - 72
[50] An efficient RFID data cleaning method based on wavelet density estimation
Liu, Yaozong
Zhang, Hong
Han, Fawang
Tan, Jun
Journal of Digital Information Management, 2015, 13 (01): : 10 - 14

← 1 2 3 4 5 →