Measuring Housing Vitality from Multi-Source Big Data and Machine Learning

被引:3
|
作者
Zhou, Yang [1 ,3 ,4 ]
Xue, Lirong [2 ]
Shi, Zhengyu [3 ]
Wu, Libo [1 ,4 ]
Fan, Jianqing [2 ]
机构
[1] Fudan Univ, Inst Big Data, Shanghai, Peoples R China
[2] Princeton Univ, Dept Operat Res & Financial Engn, Princeton, NJ 08544 USA
[3] Fudan Univ, Sch Data Sci, Shanghai, Peoples R China
[4] Fudan Univ, MOE Lab Natl Dev & Intelligent Governance, Shanghai, Peoples R China
基金
中国博士后科学基金;
关键词
Computational social science; Factor model; FarmPredict; Housing vitality; Machine learning; VACANCY; NUMBER; BOOM;
D O I
10.1080/01621459.2022.2096038
中图分类号
O21 [概率论与数理统计]; C8 [统计学];
学科分类号
020208 ; 070103 ; 0714 ;
摘要
Measuring timely high-resolution socioeconomic outcomes is critical for policymaking and evaluation, but hard to reliably obtain. With the help of machine learning and cheaply available data such as social media and nightlight, it is now possible to predict such indices in fine granularity. This article demonstrates an adaptive way to measure the time trend and spatial distribution of housing vitality (number of occupied houses) with the help of multiple easily accessible datasets: energy, nightlight, and land-use data. We first identified the high-frequency housing occupancy status from energy consumption data and then matched it with the monthly nightlight data. We then introduced the Factor-Augmented Regularized Model for prediction (FarmPredict) to deal with the dependence and collinearity issue among predictors by effectively lifting the prediction space, which is suitable to most machine learning algorithms. The heterogeneity issue in big data analysis is mitigated through the land-use data. FarmPredict allows us to extend the regional results to the city level, with a 76% out-of-sample explanation of the spatial and timeliness variation in the house usage. Since energy is indispensable for life, our method is highly transferable with the only requirement of publicly accessible data. Our article provides an alternative approach with statistical machine learning to predict socioeconomic outcomes without the reliance on existing census and survey data. Supplementary materials for this article are available online.
引用
收藏
页码:1045 / 1059
页数:15
相关论文
共 50 条
  • [1] Comments on "Measuring Housing Vitality from Multi-Source Big Data and Machine Learning"
    Tu, Wei
    Jiang, Bei
    Kong, Linglong
    JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, 2022, 117 (539) : 1060 - 1062
  • [2] Discussion of "Measuring Housing Vitality from Multi-Source Big Data and Machine Learning"
    Banerjee, Sudipto
    JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, 2022, 117 (539) : 1063 - 1065
  • [3] Machine Learning Modeling of Vitality Characteristics in Historical Preservation Zones with Multi-Source Data
    Huang, Xiaoran
    Gong, Pixin
    Wang, Siyan
    White, Marcus
    Zhang, Bo
    BUILDINGS, 2022, 12 (11)
  • [4] Learning from multi-source data
    Fromont, E
    Cordier, MO
    Quiniou, R
    KNOWLEDGE DISCOVERY IN DATABASES: PKDD 2004, PROCEEDINGS, 2004, 3202 : 503 - 505
  • [5] Comprehensive Vitality Evaluation of Urban Blocks based on Multi-source Geographic Big Data
    Tang L.
    Xu H.
    Ding Y.
    Journal of Geo-Information Science, 2022, 24 (08) : 1575 - 1588
  • [6] Exploring the Impact of Urban Amenities on Business Circle Vitality Using Multi-Source Big Data
    Ji, Yi
    Wang, Zilong
    Zhu, Dan
    LAND, 2024, 13 (10)
  • [7] An Evaluation of Street Dynamic Vitality and Its Influential Factors Based on Multi-Source Big Data
    Guo, Xin
    Chen, Hongfei
    Yang, Xiping
    ISPRS INTERNATIONAL JOURNAL OF GEO-INFORMATION, 2021, 10 (03)
  • [8] Application of English education big data system based on multi-source information fusion and machine learning
    Du, Kehan
    SOFT COMPUTING, 2023,
  • [9] Understanding house price appreciation using multi-source big geo-data and machine learning
    Kang, Yuhao
    Zhang, Fan
    Peng, Wenzhe
    Gao, Song
    Rao, Jinmeng
    Duarte, Fabio
    Ratti, Carlo
    LAND USE POLICY, 2021, 111
  • [10] Mapping Himalayan leucogranites by machine learning using multi-source data
    Wang Z.
    Zuo R.
    Earth Science Frontiers, 2023, 30 (05) : 216 - 226