Measuring Housing Vitality from Multi-Source Big Data and Machine Learning

被引:3
|
作者
Zhou, Yang [1 ,3 ,4 ]
Xue, Lirong [2 ]
Shi, Zhengyu [3 ]
Wu, Libo [1 ,4 ]
Fan, Jianqing [2 ]
机构
[1] Fudan Univ, Inst Big Data, Shanghai, Peoples R China
[2] Princeton Univ, Dept Operat Res & Financial Engn, Princeton, NJ 08544 USA
[3] Fudan Univ, Sch Data Sci, Shanghai, Peoples R China
[4] Fudan Univ, MOE Lab Natl Dev & Intelligent Governance, Shanghai, Peoples R China
基金
中国博士后科学基金;
关键词
Computational social science; Factor model; FarmPredict; Housing vitality; Machine learning; VACANCY; NUMBER; BOOM;
D O I
10.1080/01621459.2022.2096038
中图分类号
O21 [概率论与数理统计]; C8 [统计学];
学科分类号
020208 ; 070103 ; 0714 ;
摘要
Measuring timely high-resolution socioeconomic outcomes is critical for policymaking and evaluation, but hard to reliably obtain. With the help of machine learning and cheaply available data such as social media and nightlight, it is now possible to predict such indices in fine granularity. This article demonstrates an adaptive way to measure the time trend and spatial distribution of housing vitality (number of occupied houses) with the help of multiple easily accessible datasets: energy, nightlight, and land-use data. We first identified the high-frequency housing occupancy status from energy consumption data and then matched it with the monthly nightlight data. We then introduced the Factor-Augmented Regularized Model for prediction (FarmPredict) to deal with the dependence and collinearity issue among predictors by effectively lifting the prediction space, which is suitable to most machine learning algorithms. The heterogeneity issue in big data analysis is mitigated through the land-use data. FarmPredict allows us to extend the regional results to the city level, with a 76% out-of-sample explanation of the spatial and timeliness variation in the house usage. Since energy is indispensable for life, our method is highly transferable with the only requirement of publicly accessible data. Our article provides an alternative approach with statistical machine learning to predict socioeconomic outcomes without the reliance on existing census and survey data. Supplementary materials for this article are available online.
引用
收藏
页码:1045 / 1059
页数:15
相关论文
共 50 条
  • [21] Multi-source and heterogeneous marine hydrometeorology spatio-temporal data analysis with machine learning: a survey
    Wu, Song
    Li, Xiaoyong
    Dong, Wei
    Wang, Senzhang
    Zhang, Xiaojiang
    Xu, Zichen
    WORLD WIDE WEB-INTERNET AND WEB INFORMATION SYSTEMS, 2023, 26 (03): : 1115 - 1156
  • [22] Machine learning applications for multi-source data of edible crops: A review of current trends and future prospects
    Zhang, Yanying
    Wang, Yuanzhong
    FOOD CHEMISTRY-X, 2023, 19
  • [23] Machine Learning Fusion Multi-Source Data Features for Classification Prediction of Lunar Surface Geological Units
    Zuo, Wei
    Zeng, Xingguo
    Gao, Xingye
    Zhang, Zhoubin
    Liu, Dawei
    Li, Chunlai
    REMOTE SENSING, 2022, 14 (20)
  • [24] City scale urban flooding risk assessment using multi-source data and machine learning approach
    Wei, Qing
    Zhang, Huijin
    Chen, Yongqi
    Xie, Yifan
    Yin, Hailong
    Xu, Zuxin
    JOURNAL OF HYDROLOGY, 2025, 651
  • [25] Comparative analysis of multi-source data for machine learning-based LAI estimation in Argania spinosa
    Mouafik, Mohamed
    Fouad, Mounir
    Audet, Felix Antoine
    El Aboudi, Ahmed
    ADVANCES IN SPACE RESEARCH, 2024, 73 (10) : 4976 - 4987
  • [26] Multi-source and heterogeneous marine hydrometeorology spatio-temporal data analysis with machine learning: a survey
    Song Wu
    Xiaoyong Li
    Wei Dong
    Senzhang Wang
    Xiaojiang Zhang
    Zichen Xu
    World Wide Web, 2023, 26 : 1115 - 1156
  • [27] Multi-source statistics on employment status in Italy, a machine learning approach
    Varriale, Roberta
    Alfo', Marco
    METRON-INTERNATIONAL JOURNAL OF STATISTICS, 2023, 81 (01): : 37 - 63
  • [28] Multi-source statistics on employment status in Italy, a machine learning approach
    Roberta Varriale
    Marco Alfo’
    METRON, 2023, 81 : 37 - 63
  • [29] Potential of machine learning in leaf-based multi-source data driven tomato growth monitoring
    Zhang, Ke
    Chai, Qi
    Qian, Xiaojin
    Gao, Ruocheng
    Liu, Xiaoying
    Yang, Lifei
    Pang, Guan
    Wang, Yu
    Sun, Jin
    SMART AGRICULTURAL TECHNOLOGY, 2025, 10
  • [30] EVALUATION OF ESA CCI ABOVE GROUND BIOMASS USING MULTI-SOURCE, MULTI-SPECTRAL DATA AND MACHINE LEARNING
    Srivastava, Harsh
    Pant, Triloki
    IGARSS 2023 - 2023 IEEE INTERNATIONAL GEOSCIENCE AND REMOTE SENSING SYMPOSIUM, 2023, : 3082 - 3085