Fuzzy neuron modeling of incomplete data for missing value imputation

被引：3

作者：

Zhang, Zheng ^{[2
]}

Yan, Xiaoming ^{[1
]}

Zhang, Liyong ^{[1
]}

Lai, Xiaochen ^{[2
]}

Lu, Wei ^{[1
]}

机构：

[1] Dalian Univ Technol, Sch Control Sci & Engn, Dalian 116024, Peoples R China

[2] Dalian Univ Technol, Sch Software, Dalian 116600, Peoples R China

来源：

INFORMATION SCIENCES | 2024年 / 659卷

基金：

中国国家自然科学基金;

关键词：

Incomplete data; Missing value imputation; Category-based modeling; Tracking-removed autoencoder; Iterative learning; C-MEANS; REGRESSION; IDENTIFICATION; SYSTEMS;

D O I：

10.1016/j.ins.2023.120065

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Missing values are a common problem found in many real -world datasets, and cannot be avoided. It is a challenging task to model incomplete data and reasonably impute missing values. This paper focuses on regression imputation and uses a tracking -removed autoencoder (TRAE) to construct the mutual fitting correlation on incomplete data. Considering the differences in regression relationships across different sample categories, we introduce Takagi-Sugeno (TS) fuzzy architecture and propose a category -based tracking -removed autoencoder (TS-TRAE) to model incomplete data for missing value imputation. The TS-TRAE model partitions the incomplete dataset into several subclusters using membership information obtained from fuzzy clustering, then establishes a TRAE-based submodel to mine relationships within each subcluster for precise modeling of incomplete data. During model training, in order to fully utilize all existing values, we treat missing values as variables and propose an iterative learning method that optimizes missing variables and network parameters collaboratively. This method allows incomplete samples to participate in model training while also enabling the imputation of missing values. The TS-TRAE model integrates the inner category structure of incomplete data and the attribute association features effectively. The experimental results verify the effectiveness of the proposed method.

引用

页数：19

共 49 条

[1]

Abdella M, 2005, ICCC 2005: IEEE 3rd International Conference on Computational Cybernetics, P207

[2] FINDING A FLEXIBLE HOT-DECK IMPUTATION METHOD FOR MULTINOMIAL DATA [J].

Andridge, Rebecca ;

Bechtel, Laura ;

Thompson, Katherine Jenny .

JOURNAL OF SURVEY STATISTICS AND METHODOLOGY, 2021, 9 (04) :789-809

[3] A hybrid method for imputation of missing values using optimized fuzzy c-means with support vector regression and a genetic algorithm [J].

Aydilek, Ibrahim Berkan ;

Arslan, Ahmet .

INFORMATION SCIENCES, 2013, 233 :25-35

[4]

Aydilek IB, 2012, INT J INNOV COMPUT I, V8, P4705

[5]

Demsar J, 2006, J MACH LEARN RES, V7, P1

[6] Fuzzy Broad Learning System: A Novel Neuro-Fuzzy Model for Regression and Classification [J].

Feng, Shuang ;

Chen, C. L. Philip .

IEEE TRANSACTIONS ON CYBERNETICS, 2020, 50 (02) :414-424

[7] K nearest neighbours with mutual information for simultaneous classification and missing data imputation [J].

Garcia-Laencina, Pedro J. ;

Sancho-Gomez, Jose-Luis ;

Figueiras-Vidal, Anibal R. ;

Verleysen, Michel .

NEUROCOMPUTING, 2009, 72 (7-9) :1483-1493

[8] Fuzzy c-means clustering of incomplete data [J].

Hathaway, RJ ;

Bezdek, JC .

IEEE TRANSACTIONS ON SYSTEMS MAN AND CYBERNETICS PART B-CYBERNETICS, 2001, 31 (05) :735-744

[9] Clustering incomplete relational data using the non-Euclidean relational fuzzy c-means algorithm [J].

Hathaway, RJ ;

Bezdek, JC .

PATTERN RECOGNITION LETTERS, 2002, 23 (1-3) :151-160

[10]

Herbold S., 2020, Journal of Open Source Software, V5, P2173, DOI [10.21105/joss.02173, DOI 10.21105/JOSS.02173]

← 1 2 3 4 5 →