A population spatialization method based on the integration of feature selection and an improved random forest model

被引:0
|
作者
Zhao, Zhen [1 ]
Guo, Hongmei [1 ]
Jiang, Xueli [2 ]
Zhang, Ying [1 ]
Lu, Changjiang [1 ]
Zhang, Can [1 ]
He, Zonghang [1 ]
机构
[1] Seismol Bur Sichuan Prov, Chengdu, Sichuan, Peoples R China
[2] Southwest Jiaotong Univ, Chengdu, Sichuan, Peoples R China
来源
PLOS ONE | 2025年 / 20卷 / 04期
关键词
NIGHTTIME LIGHT; LAND-USE; REGRESSION; DENSITY; SUPPORT; CHINA; IMAGERY;
D O I
10.1371/journal.pone.0321263
中图分类号
O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];
学科分类号
07 ; 0710 ; 09 ;
摘要
Ascertaining the precise and accurate spatial distribution of population is essential in conducting effective urban planning, resource allocation, and emergency rescue planning. The random forest (RF) model is widely used in population spatialization studies. However, the complexity of population distribution characteristics and the limitations of the RF model in processing unbalanced datasets affect population prediction accuracy. To address these issues, a population spatialization model that integrates feature selection with an improved random forest is proposed herein. Firstly, recursive feature elimination using cross validation (RFECV), maximum information coefficient (MIC), and mean decrease accuracy (MDA) methods were utilized to select population distribution feature factors. The random forest was constructed using feature subsets that were selected via different feature selection methods, namely MIC-RF, RFECV-RF and MDA-RF. Subsequently, the feature factors corresponding to the model with the highest accuracy were selected as the optimal feature subsets and used in the model construction as input data. Additionally, considering the imbalanced in population spatial distribution, we used the K-means ++ clustering algorithm to cluster the optimal feature subset, and we used the bootstrap sampling method to extract the same amount of data from each cluster and fuse it with the training subset to build an improved random forest model. Based on this model, a spatial population distribution dataset of the Southern Sichuan Economic Zone at a 500m resolution was generated. Finally, the population dataset generated in this study was compared and validated with the WorldPop dataset. The results showed that utilizing feature selection methods improves model accuracy to varying degrees compared with RF based on all factors, and the MDA-RF had the lowest MAPE of 0.174 and the highest R2 of 0.913 among them. Therefore, feature factors selection using the MDA method was considered the optimal feature subset. Compared with MDA-RF, the prediction accuracy of the improved RF built on the same subset increased by 1.7%, indicating that improving the bootstrap sampling of random forest by using the K-means++ clustering algorithm can enhance model accuracy to some extent. Compared with the WorldPop dataset, the accuracy of the results predicted using the proposed method was enhanced. The MRE and RMSE of the WorldPop dataset were 57.24 and 23174.98, respectively, while the MRE and RMSE of the proposed method were 25.00 and 15776.50, respectively. This implies that the method proposed in this paper could simulate population spatial distribution more accurately.
引用
收藏
页数:25
相关论文
共 50 条
  • [1] A Population Spatialization Model at the Building Scale Using Random Forest
    Wang, Mengqi
    Wang, Yinglin
    Li, Bozhao
    Cai, Zhongliang
    Kang, Mengjun
    REMOTE SENSING, 2022, 14 (08)
  • [2] An Improved Feature Selection Method Based on Random Forest Algorithm for Wind Turbine Condition Monitoring
    Li, Guo
    Wang, Chensheng
    Zhang, Di
    Yang, Guang
    SENSORS, 2021, 21 (16)
  • [3] A New Noisy Random Forest Based Method for Feature Selection
    Akhiat, Yassine
    Manzali, Youness
    Chahhou, Mohamed
    Zinedine, Ahmed
    CYBERNETICS AND INFORMATION TECHNOLOGIES, 2021, 21 (02) : 10 - 28
  • [4] Intrusion Detection Model Based on Feature Selection and Random Forest
    Dong, Rui Hong
    Shui, Yong Li
    Zhang, Qiu Yu
    International Journal of Network Security, 2021, 23 (06) : 985 - 996
  • [5] Spatialization of Population in the Bohai Rim Region Using Random Forest Model
    Gao X.
    Yang X.
    Chen B.
    Lin L.
    Journal of Geo-Information Science, 2022, 24 (06) : 1150 - 1162
  • [6] Population spatialization in Zhengzhou city based on multi-source data and random forest model
    Liu, Lingling
    Cheng, Gang
    Yang, Jie
    Cheng, Yushu
    FRONTIERS IN EARTH SCIENCE, 2023, 11
  • [7] Random forest -based nonlinear improved feature extraction and selection for fault classification
    Fezai, Radhia
    Bouzrara, Kais
    Mansouri, Majdi
    Nounou, Hazem
    Nounou, Mohamed
    Trabelsi, Mohamed
    2021 18TH INTERNATIONAL MULTI-CONFERENCE ON SYSTEMS, SIGNALS & DEVICES (SSD), 2021, : 601 - 606
  • [8] Feature selection algorithm based on random forest
    Yao, Deng-Ju
    Yang, Jing
    Zhan, Xiao-Juan
    Jilin Daxue Xuebao (Gongxueban)/Journal of Jilin University (Engineering and Technology Edition), 2014, 44 (01): : 137 - 141
  • [9] TEHRAN AIR POLLUTANTS PREDICTION BASED ON RANDOM FOREST FEATURE SELECTION METHOD
    Shamsoddini, A.
    Aboodi, M. R.
    Karami, J.
    ISPRS INTERNATIONAL JOINT CONFERENCES OF THE 2ND GEOSPATIAL INFORMATION RESEARCH (GI RESEARCH 2017); THE 4TH SENSORS AND MODELS IN PHOTOGRAMMETRY AND REMOTE SENSING (SMPR 2017); THE 6TH EARTH OBSERVATION OF ENVIRONMENTAL CHANGES (EOEC 2017), 2017, 42-4 (W4): : 483 - 488
  • [10] An Integration of feature extraction and Guided Regularized Random Forest feature selection for Smartphone based Human Activity Recognition
    Thakur, Dipanwita
    Biswas, Suparna
    JOURNAL OF NETWORK AND COMPUTER APPLICATIONS, 2022, 204