A random forest classifier with cost-sensitive learning to extract urban landmarks from an imbalanced dataset

被引:15
作者
Kang, Mengjun [1 ]
Liu, Yue [1 ]
Wang, Mengqi [1 ]
Li, Lin [1 ]
Weng, Min [1 ]
机构
[1] Wuhan Univ, Sch Resource & Environm Sci, Wuhan, Peoples R China
基金
国家重点研发计划;
关键词
Urban landmark; salience; random forest; class imbalance; cost-sensitive ensemble; ENVIRONMENT; SALIENCE; SMOTE;
D O I
10.1080/13658816.2021.1977814
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Urban landmarks play an important role as spatial references in spatial cognition, navigation, map design and urban planning. However, the current landmark extraction methods do not consider the imbalance between the landmark and non-landmarknon-landmark samples in a dataset, so the extraction results are biased toward the class with the majority of sample data, resulting in poor classification performance for the class with the fewest sample data. This study introduces a random forest (RF) classifier combined with cost-sensitive learning to extract urban landmarks automatically from a basic spatial database. First, the optimal feature set is determined according to the importance of features. Next, a cost-sensitive RF algorithm is applied to extract landmarks, which determines the misclassification cost according to the class distribution, and each decision tree is weighted by the classification results. The method has good performance, with a recall and area under the ROC curve (AUC) greater than 90%, and the model is also applicable to small sample sets, which can reduce the cost of manual labor.
引用
收藏
页码:496 / 513
页数:18
相关论文
共 42 条
  • [31] RUSBoost: A Hybrid Approach to Alleviating Class Imbalance
    Seiffert, Chris
    Khoshgoftaar, Taghi M.
    Van Hulse, Jason
    Napolitano, Amri
    [J]. IEEE TRANSACTIONS ON SYSTEMS MAN AND CYBERNETICS PART A-SYSTEMS AND HUMANS, 2010, 40 (01): : 185 - 197
  • [32] Sorrows ME, 1999, LECT NOTES COMPUT SC, V1661, P37
  • [33] Cost-sensitive boosting for classification of imbalanced data
    Sun, Yamnin
    Kamel, Mohamed S.
    Wong, Andrew K. C.
    Wang, Yang
    [J]. PATTERN RECOGNITION, 2007, 40 (12) : 3358 - 3378
  • [34] [陶新民 Tao Xinmin], 2018, [电子学报, Acta Electronica Sinica], V46, P2725
  • [35] Tezuka T, 2005, LECT NOTES COMPUT SC, V3693, P379
  • [36] Cost-Sensitive Classification: Empirical Evaluation of a Hybrid Genetic Decision Tree Induction Algorithm
    Turney, Peter D.
    [J]. JOURNAL OF ARTIFICIAL INTELLIGENCE RESEARCH, 1994, 2 : 369 - 409
  • [37] Weiss G., 2013, Foundations of Imbalanced Learning. Imbalanced Learning: Foundations, Algorithms
  • [38] [袁兴梅 Yuan Xingmei], 2013, [模式识别与人工智能, Pattern Recognition and Artificial Intelligence], V26, P315
  • [39] Multiple intra-urban land use simulations and driving factors analysis: a case study in Huicheng, China
    Zhang, Dachuan
    Liu, Xiaoping
    Wu, Xiaoyu
    Yao, Yao
    Wu, Xinxin
    Chen, Yimin
    [J]. GISCIENCE & REMOTE SENSING, 2019, 56 (02) : 282 - 308
  • [40] Online classifier adaptation for cost-sensitive learning
    Zhang, Junlin
    Garcia, Jose
    [J]. NEURAL COMPUTING & APPLICATIONS, 2016, 27 (03) : 781 - 789