Airbnb rental price modeling based on Latent Dirichlet Allocation and MESF-XGBoost composite model

被引:14
作者
Islam, Md Didarul [1 ]
Li, Bin [2 ]
Islam, Kazi Saiful [3 ]
Ahasan, Rakibul [4 ]
Mia, Md. Rimu [3 ]
Haque, Md Emdadul [5 ]
机构
[1] George Mason Univ, Dept Geog & Geoinformat Sci, Fairfax, VA 22030 USA
[2] Cent Michigan Univ, Dept Geog & Environm Studies, Mt Pleasant, MI 48859 USA
[3] Khulna Univ, Urban & Rural Planning Discipline, Khulna 9208, Bangladesh
[4] Texas A&M Univ, Dept Geog, College Stn, TX 77843 USA
[5] Univ Barishal, Dept Geol & Min, Barishal 8254, Bangladesh
来源
MACHINE LEARNING WITH APPLICATIONS | 2022年 / 7卷
关键词
Machine Learning; Latent Dirichlet Allocation; Eigenvector Spatial Filtering; XGBoost; Spatial Data Modeling; SHARING ECONOMY; LISTINGS; DETERMINANTS;
D O I
10.1016/j.mlwa.2021.100208
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Airbnb price modeling is an important decision -making tool that determines the acceptability and profitability of the service. In this study, we demonstrated how proper descriptions of an Airbnb listing and location could influence determining the prices. We assumed the proper description of a listing property positively influences the renter's decision making; therefore, we applied a Latent Dirichlet Allocation (LDA) based topic model for generating synthetic variables from the textual description of property aiming to improve price prediction accuracy. Additionally, we applied a Moran Eigenvector Spatial Filtering based XGBoost (MESF-XGBoost) model to address the spatial dependence of location data and improve prediction accuracy. Our study at the San Jose County Airbnb dataset found that the number of bedrooms, accommodations, property types, and the total number of reviews positively influence the listing price, whereas the absence of a super host badge and cancellation policy negatively influence the price. The experiment demonstrates that incorporating synthetic variables from both LDA and MESF into the model specification improves the prediction accuracy. The experiment reveals that the XGBoost model with only non -spatial features is not strong enough to address spatial dependence; therefore, it cannot minimize spatial autocorrelation issues.
引用
收藏
页数:9
相关论文
共 40 条
  • [1] Ahasan R., 2019, ACSP 2019 ANN C, DOI [10.31235/osf.io/ghdx4, DOI 10.31235/OSF.IO/GHDX4]
  • [2] Ahasan R., 2019, Graduate theses and dissertations
  • [3] Latent Dirichlet allocation
    Blei, DM
    Ng, AY
    Jordan, MI
    [J]. JOURNAL OF MACHINE LEARNING RESEARCH, 2003, 3 (4-5) : 993 - 1022
  • [4] Why is central Paris rich and downtown Detroit poor? An amenity-based theory
    Brueckner, JK
    Thisse, JF
    Zenou, Y
    [J]. EUROPEAN ECONOMIC REVIEW, 1999, 43 (01) : 91 - 107
  • [5] Text Mining of Open-Ended Questions in Self-Assessment of University Teachers: An LDA Topic Modeling Approach
    Buenano-Fernandez, Diego
    Gonzalez, Mario
    Gil, David
    Lujan-Mora, Sergio
    [J]. IEEE ACCESS, 2020, 8 : 35318 - 35330
  • [6] PRICE DETERMINANTS OF AIRBNB LISTINGS: EVIDENCE FROM HONG KONG
    Cai, Yuan
    Zhou, Yongbo
    Ma, Jianyu
    Scott, Noel
    [J]. TOURISM ANALYSIS, 2019, 24 (02): : 227 - 242
  • [7] XGBoost: A Scalable Tree Boosting System
    Chen, Tianqi
    Guestrin, Carlos
    [J]. KDD'16: PROCEEDINGS OF THE 22ND ACM SIGKDD INTERNATIONAL CONFERENCE ON KNOWLEDGE DISCOVERY AND DATA MINING, 2016, : 785 - 794
  • [8] Chen Y, 2017, INT J CONTEMP HOSP M, V29, P2405, DOI [10.1108/ijchm-10-2016-0606, 10.1108/IJCHM-10-2016-0606]
  • [9] Effects of location on Airbnb apartment pricing in Malaga
    Chica-Olmo, Jorge
    Gabriel Gonzalez-Morales, Juan
    Luis Zafra-Gomez, Jose
    [J]. TOURISM MANAGEMENT, 2020, 77
  • [10] Dudás G, 2017, GEOGR TECH, V12, P23, DOI 10.21163/GT_2017.121.03