A population spatialization method based on the integration of feature selection and an improved random forest model

被引:0
|
作者
Zhao, Zhen [1 ]
Guo, Hongmei [1 ]
Jiang, Xueli [2 ]
Zhang, Ying [1 ]
Lu, Changjiang [1 ]
Zhang, Can [1 ]
He, Zonghang [1 ]
机构
[1] Seismol Bur Sichuan Prov, Chengdu, Sichuan, Peoples R China
[2] Southwest Jiaotong Univ, Chengdu, Sichuan, Peoples R China
来源
PLOS ONE | 2025年 / 20卷 / 04期
关键词
NIGHTTIME LIGHT; LAND-USE; REGRESSION; DENSITY; SUPPORT; CHINA; IMAGERY;
D O I
10.1371/journal.pone.0321263
中图分类号
O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];
学科分类号
07 ; 0710 ; 09 ;
摘要
Ascertaining the precise and accurate spatial distribution of population is essential in conducting effective urban planning, resource allocation, and emergency rescue planning. The random forest (RF) model is widely used in population spatialization studies. However, the complexity of population distribution characteristics and the limitations of the RF model in processing unbalanced datasets affect population prediction accuracy. To address these issues, a population spatialization model that integrates feature selection with an improved random forest is proposed herein. Firstly, recursive feature elimination using cross validation (RFECV), maximum information coefficient (MIC), and mean decrease accuracy (MDA) methods were utilized to select population distribution feature factors. The random forest was constructed using feature subsets that were selected via different feature selection methods, namely MIC-RF, RFECV-RF and MDA-RF. Subsequently, the feature factors corresponding to the model with the highest accuracy were selected as the optimal feature subsets and used in the model construction as input data. Additionally, considering the imbalanced in population spatial distribution, we used the K-means ++ clustering algorithm to cluster the optimal feature subset, and we used the bootstrap sampling method to extract the same amount of data from each cluster and fuse it with the training subset to build an improved random forest model. Based on this model, a spatial population distribution dataset of the Southern Sichuan Economic Zone at a 500m resolution was generated. Finally, the population dataset generated in this study was compared and validated with the WorldPop dataset. The results showed that utilizing feature selection methods improves model accuracy to varying degrees compared with RF based on all factors, and the MDA-RF had the lowest MAPE of 0.174 and the highest R2 of 0.913 among them. Therefore, feature factors selection using the MDA method was considered the optimal feature subset. Compared with MDA-RF, the prediction accuracy of the improved RF built on the same subset increased by 1.7%, indicating that improving the bootstrap sampling of random forest by using the K-means++ clustering algorithm can enhance model accuracy to some extent. Compared with the WorldPop dataset, the accuracy of the results predicted using the proposed method was enhanced. The MRE and RMSE of the WorldPop dataset were 57.24 and 23174.98, respectively, while the MRE and RMSE of the proposed method were 25.00 and 15776.50, respectively. This implies that the method proposed in this paper could simulate population spatial distribution more accurately.
引用
收藏
页数:25
相关论文
共 50 条
  • [21] Distance Correlation-Based Feature Selection in Random Forest
    Ratnasingam, Suthakaran
    Munoz-Lopez, Jose
    ENTROPY, 2023, 25 (09)
  • [22] Random Forest-based feature selection for emotion recognition
    Gharsalli, Sonia
    Emile, Bruno
    Laurent, Helene
    Desquesnes, Xavier
    Vivet, Damien
    5TH INTERNATIONAL CONFERENCE ON IMAGE PROCESSING, THEORY, TOOLS AND APPLICATIONS 2015, 2015, : 268 - 272
  • [23] Fault Line Selection and Location of Distribution Network Based on Improved Random Forest Method
    Ru, Jiaxin
    Luo, Guomin
    Shang, Boyang
    Luo, Simin
    Liu, Wenlin
    Wang, Shaoliang
    2022 4TH INTERNATIONAL CONFERENCE ON SMART POWER & INTERNET ENERGY SYSTEMS, SPIES, 2022, : 1179 - 1184
  • [24] A Feature Selection Method based on Improved TFIDF
    Wei Yong-qing
    Liu Pei-yu
    Zhu Zhen-fang
    2008 3RD INTERNATIONAL CONFERENCE ON PERVASIVE COMPUTING AND APPLICATIONS, VOLS 1 AND 2, 2008, : 94 - +
  • [25] Efficient Feature Selection Method Using Contribution Ratio by Random Forest
    Murata, Ryuei
    Mishina, Yohei
    Yamauchi, Yuji
    Yamashita, Takayoshi
    Fujiyoshi, Hironobu
    2015 21ST KOREA-JAPAN JOINT WORKSHOP ON FRONTIERS OF COMPUTER VISION, 2015,
  • [26] Room Occupancy Detection Based on Random Forest with Timestamp Features and ANOVA Feature Selection Method
    Alam S.
    Sari R.M.
    Alfian G.
    Farooq U.
    J. Comput. Sci. Eng., 2024, 1 (10-18): : 10 - 18
  • [27] Optimal Feature Selection for Partial Discharge Recognition of Cable Systems Based on the Random Forest Method
    Peng, Xiaosheng
    Yang, Guangyao
    Zheng, Shijie
    Xiong, Lei
    Bai, Junyang
    2016 CHINA INTERNATIONAL CONFERENCE ON ELECTRICITY DISTRIBUTION (CICED), 2016,
  • [28] Hidden AS link prediction based on random forest feature selection and GWO-XGBoost model
    Wang, Zekang
    Yuan, Fuxiang
    Li, Ruixiang
    Zhang, Meng
    Luo, Xiangyang
    COMPUTER NETWORKS, 2025, 262
  • [29] An Innovative NOx Emissions Prediction Model Based on Random Forest Feature Selection and Evolutionary Reformer
    Meng, Xianyu
    Li, Xi
    Chen, Jialei
    Fu, Yongyan
    Zhang, Chu
    Nazir, Muhammad Shahzad
    Peng, Tian
    PROCESSES, 2025, 13 (01)
  • [30] An Improved Filtering Method of Superfluous Relationship in Domain Model Based on Random Forest
    Yu, Mengyuan
    Wang, Lisong
    Cao, Buzhan
    Hong, Yang
    COMMUNICATIONS, SIGNAL PROCESSING, AND SYSTEMS, VOL. 1, 2022, 878 : 857 - 862