Measuring Housing Vitality from Multi-Source Big Data and Machine Learning

被引:3
|
作者
Zhou, Yang [1 ,3 ,4 ]
Xue, Lirong [2 ]
Shi, Zhengyu [3 ]
Wu, Libo [1 ,4 ]
Fan, Jianqing [2 ]
机构
[1] Fudan Univ, Inst Big Data, Shanghai, Peoples R China
[2] Princeton Univ, Dept Operat Res & Financial Engn, Princeton, NJ 08544 USA
[3] Fudan Univ, Sch Data Sci, Shanghai, Peoples R China
[4] Fudan Univ, MOE Lab Natl Dev & Intelligent Governance, Shanghai, Peoples R China
基金
中国博士后科学基金;
关键词
Computational social science; Factor model; FarmPredict; Housing vitality; Machine learning; VACANCY; NUMBER; BOOM;
D O I
10.1080/01621459.2022.2096038
中图分类号
O21 [概率论与数理统计]; C8 [统计学];
学科分类号
020208 ; 070103 ; 0714 ;
摘要
Measuring timely high-resolution socioeconomic outcomes is critical for policymaking and evaluation, but hard to reliably obtain. With the help of machine learning and cheaply available data such as social media and nightlight, it is now possible to predict such indices in fine granularity. This article demonstrates an adaptive way to measure the time trend and spatial distribution of housing vitality (number of occupied houses) with the help of multiple easily accessible datasets: energy, nightlight, and land-use data. We first identified the high-frequency housing occupancy status from energy consumption data and then matched it with the monthly nightlight data. We then introduced the Factor-Augmented Regularized Model for prediction (FarmPredict) to deal with the dependence and collinearity issue among predictors by effectively lifting the prediction space, which is suitable to most machine learning algorithms. The heterogeneity issue in big data analysis is mitigated through the land-use data. FarmPredict allows us to extend the regional results to the city level, with a 76% out-of-sample explanation of the spatial and timeliness variation in the house usage. Since energy is indispensable for life, our method is highly transferable with the only requirement of publicly accessible data. Our article provides an alternative approach with statistical machine learning to predict socioeconomic outcomes without the reliance on existing census and survey data. Supplementary materials for this article are available online.
引用
收藏
页码:1045 / 1059
页数:15
相关论文
共 50 条
  • [21] Enhanced SHL Recognition Using Machine Learning and Deep Learning Models with Multi-source Data
    Li, Mengyuan
    Zhu, Jun
    Zhang, Yuanyuan
    Lu, Xiaoling
    ADJUNCT PROCEEDINGS OF THE 2023 ACM INTERNATIONAL JOINT CONFERENCE ON PERVASIVE AND UBIQUITOUS COMPUTING & THE 2023 ACM INTERNATIONAL SYMPOSIUM ON WEARABLE COMPUTING, UBICOMP/ISWC 2023 ADJUNCT, 2023, : 505 - 510
  • [22] MACHINE LEARNING-BASED ECONOMIC DEVELOPMENT MAPPING FROM MULTI-SOURCE OPEN GEOSPATIAL DATA
    Cao, Rui
    Tu, Wei
    Cai, Jixuan
    Zhao, Tianhong
    Xiao, Jie
    Cao, Jinzhou
    Gao, Qili
    Su, Hanjing
    XXIV ISPRS CONGRESS IMAGING TODAY, FORESEEING TOMORROW, COMMISSION IV, 2022, 5-4 : 259 - 266
  • [23] Ensemble learning for multi-source neural machine translation
    1600, Association for Computational Linguistics, ACL Anthology
  • [24] A Multi-Source Data Aggregation and Multidimensional Analysis Model for Big Data
    Liu, Pan
    Chen, Lin
    4TH ANNUAL INTERNATIONAL CONFERENCE ON INFORMATION TECHNOLOGY AND APPLICATIONS (ITA 2017), 2017, 12
  • [25] A novel framework to predict chlorophyll-a concentrations in water bodies through multi-source big data and machine learning algorithms
    Karimian, Hamed
    Huang, Jinhuang
    Chen, Youliang
    Wang, Zhaoru
    Huang, Jinsong
    ENVIRONMENTAL SCIENCE AND POLLUTION RESEARCH, 2023, 30 (32) : 79402 - 79422
  • [26] A novel framework to predict chlorophyll-a concentrations in water bodies through multi-source big data and machine learning algorithms
    Hamed Karimian
    Jinhuang Huang
    Youliang Chen
    Zhaoru Wang
    Jinsong Huang
    Environmental Science and Pollution Research, 2023, 30 : 79402 - 79422
  • [27] Research on Medical Multi-Source Data Fusion Based on Big Data
    Hu S.
    Recent Advances in Computer Science and Communications, 2022, 15 (03) : 376 - 387
  • [28] Influence Factors on the Vitality of Old Communities in Guangzhou Using Multi-Source Data
    Wu, Fan
    Peng, Yushi
    Duan, Jiabin
    Tu, Yehuang
    An, Dongyang
    Feng, Jingrun
    Li, Jiamin
    Tang, Shijie
    Wang, Mingquan
    ICCREM 2021: CHALLENGES OF THE CONSTRUCTION INDUSTRY UNDER THE PANDEMIC, 2021, : 818 - 825
  • [29] Multi-Source Neural Machine Translation With Missing Data
    Nishimura, Yuta
    Sudoh, Katsuhito
    Neubig, Graham
    Nakamura, Satoshi
    IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2020, 28 (28) : 569 - 580
  • [30] Multi-Source Neural Machine Translation with Missing Data
    Nishimura, Yuta
    Sudoh, Katsuhito
    Neubig, Graham
    Nakamura, Satoshi
    NEURAL MACHINE TRANSLATION AND GENERATION, 2018, : 92 - 99