Measuring Housing Vitality from Multi-Source Big Data and Machine Learning

被引:3
|
作者
Zhou, Yang [1 ,3 ,4 ]
Xue, Lirong [2 ]
Shi, Zhengyu [3 ]
Wu, Libo [1 ,4 ]
Fan, Jianqing [2 ]
机构
[1] Fudan Univ, Inst Big Data, Shanghai, Peoples R China
[2] Princeton Univ, Dept Operat Res & Financial Engn, Princeton, NJ 08544 USA
[3] Fudan Univ, Sch Data Sci, Shanghai, Peoples R China
[4] Fudan Univ, MOE Lab Natl Dev & Intelligent Governance, Shanghai, Peoples R China
基金
中国博士后科学基金;
关键词
Computational social science; Factor model; FarmPredict; Housing vitality; Machine learning; VACANCY; NUMBER; BOOM;
D O I
10.1080/01621459.2022.2096038
中图分类号
O21 [概率论与数理统计]; C8 [统计学];
学科分类号
020208 ; 070103 ; 0714 ;
摘要
Measuring timely high-resolution socioeconomic outcomes is critical for policymaking and evaluation, but hard to reliably obtain. With the help of machine learning and cheaply available data such as social media and nightlight, it is now possible to predict such indices in fine granularity. This article demonstrates an adaptive way to measure the time trend and spatial distribution of housing vitality (number of occupied houses) with the help of multiple easily accessible datasets: energy, nightlight, and land-use data. We first identified the high-frequency housing occupancy status from energy consumption data and then matched it with the monthly nightlight data. We then introduced the Factor-Augmented Regularized Model for prediction (FarmPredict) to deal with the dependence and collinearity issue among predictors by effectively lifting the prediction space, which is suitable to most machine learning algorithms. The heterogeneity issue in big data analysis is mitigated through the land-use data. FarmPredict allows us to extend the regional results to the city level, with a 76% out-of-sample explanation of the spatial and timeliness variation in the house usage. Since energy is indispensable for life, our method is highly transferable with the only requirement of publicly accessible data. Our article provides an alternative approach with statistical machine learning to predict socioeconomic outcomes without the reliance on existing census and survey data. Supplementary materials for this article are available online.
引用
收藏
页码:1045 / 1059
页数:15
相关论文
共 50 条
  • [1] Machine Learning Modeling of Vitality Characteristics in Historical Preservation Zones with Multi-Source Data
    Huang, Xiaoran
    Gong, Pixin
    Wang, Siyan
    White, Marcus
    Zhang, Bo
    BUILDINGS, 2022, 12 (11)
  • [2] Application of English education big data system based on multi-source information fusion and machine learning
    Du, Kehan
    SOFT COMPUTING, 2023,
  • [3] Mapping Himalayan leucogranites by machine learning using multi-source data
    Wang Z.
    Zuo R.
    Earth Science Frontiers, 2023, 30 (05) : 216 - 226
  • [4] Recent trends of machine learning applied to multi-source data of medicinal plants
    Zhang, Yanying
    Wang, Yuanzhong
    JOURNAL OF PHARMACEUTICAL ANALYSIS, 2023, 13 (12) : 1388 - 1407
  • [5] Dynamic Maize Yield Predictions Using Machine Learning on Multi-Source Data
    Croci, Michele
    Impollonia, Giorgio
    Meroni, Michele
    Amaducci, Stefano
    REMOTE SENSING, 2023, 15 (01)
  • [6] A Machine Learning Approach for Convective Initiation Detection Using Multi-source Data
    Liu, Xuan
    Chen, Haonan
    Han, Lei
    Ge, Yurong
    2022 IEEE INTERNATIONAL GEOSCIENCE AND REMOTE SENSING SYMPOSIUM (IGARSS 2022), 2022, : 6518 - 6521
  • [7] MACHINE LEARNING-BASED ECONOMIC DEVELOPMENT MAPPING FROM MULTI-SOURCE OPEN GEOSPATIAL DATA
    Cao, Rui
    Tu, Wei
    Cai, Jixuan
    Zhao, Tianhong
    Xiao, Jie
    Cao, Jinzhou
    Gao, Qili
    Su, Hanjing
    XXIV ISPRS CONGRESS IMAGING TODAY, FORESEEING TOMORROW, COMMISSION IV, 2022, 5-4 : 259 - 266
  • [8] Enhanced SHL Recognition Using Machine Learning and Deep Learning Models with Multi-source Data
    Li, Mengyuan
    Zhu, Jun
    Zhang, Yuanyuan
    Lu, Xiaoling
    ADJUNCT PROCEEDINGS OF THE 2023 ACM INTERNATIONAL JOINT CONFERENCE ON PERVASIVE AND UBIQUITOUS COMPUTING & THE 2023 ACM INTERNATIONAL SYMPOSIUM ON WEARABLE COMPUTING, UBICOMP/ISWC 2023 ADJUNCT, 2023, : 505 - 510
  • [9] Oil palm yield prediction across blocks from multi-source data using machine learning and deep learning
    Ang, Yuhao
    Shafri, Helmi Zulhaidi Mohd
    Lee, Yang Ping
    Bakar, Shahrul Azman
    Abidin, Haryati
    Junaidi, Mohd Umar Ubaydah Mohd
    Hashim, Shaiful Jahari
    Che'Ya, Nik Norasma
    Hassan, Mohd Roshdi
    San Lim, Hwee
    Abdullah, Rosni
    Yusup, Yusri
    Muhammad, Syahidah Akmal
    Teh, Sin Yin
    Samad, Mohd Na'aim
    EARTH SCIENCE INFORMATICS, 2022, 15 (04) : 2349 - 2367
  • [10] Oil palm yield prediction across blocks from multi-source data using machine learning and deep learning
    Yuhao Ang
    Helmi Zulhaidi Mohd Shafri
    Yang Ping Lee
    Shahrul Azman Bakar
    Haryati Abidin
    Mohd Umar Ubaydah Mohd Junaidi
    Shaiful Jahari Hashim
    Nik Norasma Che’Ya
    Mohd Roshdi Hassan
    Hwee San Lim
    Rosni Abdullah
    Yusri Yusup
    Syahidah Akmal Muhammad
    Sin Yin Teh
    Mohd Na’aim Samad
    Earth Science Informatics, 2022, 15 : 2349 - 2367