Imputing environmental impact missing data of the industrial sector for Chinese cities: A machine learning approach
被引:20
作者:
Chen, Xi
论文数: 0引用数: 0
h-index: 0
机构:
Southwest Univ, Coll Econ & Management, Chongqing, Peoples R ChinaSouthwest Univ, Coll Econ & Management, Chongqing, Peoples R China
Chen, Xi
[1
]
Shuai, Chenyang
论文数: 0引用数: 0
h-index: 0
机构:
Chongqing Univ, Sch Management Sci & Real Estate, Chongqing, Peoples R China
Univ Michigan, Sch Environm & Sustainabil, Ann Arbor, MI USA
Univ Michigan, Michigan Inst Computat Discovery & Engn, Ann Arbor, MI USASouthwest Univ, Coll Econ & Management, Chongqing, Peoples R China
Shuai, Chenyang
[2
,3
,4
]
Zhao, Bu
论文数: 0引用数: 0
h-index: 0
机构:
Univ Michigan, Sch Environm & Sustainabil, Ann Arbor, MI USA
Univ Michigan, Michigan Inst Computat Discovery & Engn, Ann Arbor, MI USASouthwest Univ, Coll Econ & Management, Chongqing, Peoples R China
Zhao, Bu
[3
,4
]
Zhang, Yu
论文数: 0引用数: 0
h-index: 0
机构:
Chongqing Jiaotong Univ, Sch Econ & Management, Chongqing, Peoples R ChinaSouthwest Univ, Coll Econ & Management, Chongqing, Peoples R China
Zhang, Yu
[5
]
Li, Kaijian
论文数: 0引用数: 0
h-index: 0
机构:
Chongqing Univ, Sch Management Sci & Real Estate, Chongqing, Peoples R ChinaSouthwest Univ, Coll Econ & Management, Chongqing, Peoples R China
Li, Kaijian
[2
]
机构:
[1] Southwest Univ, Coll Econ & Management, Chongqing, Peoples R China
[2] Chongqing Univ, Sch Management Sci & Real Estate, Chongqing, Peoples R China
[3] Univ Michigan, Sch Environm & Sustainabil, Ann Arbor, MI USA
[4] Univ Michigan, Michigan Inst Computat Discovery & Engn, Ann Arbor, MI USA
[5] Chongqing Jiaotong Univ, Sch Econ & Management, Chongqing, Peoples R China
Data are the lifeblood of evidence-based decision-making and the raw material for accountability. Collecting data to regularly evaluate industrial consumption and pollution at the city level is not an easy task, which needs a significant investment of institutional and financial resources and engagement with a vast number of local governments. Despite the Chinese government putting extensive human and financial resources into data collection, there are still substantial data gaps. This study compared two traditional linear models and four machine learning models to computationally estimate missing data of six industrial consumption and pollution indicators (responses) of 701 cities from 2006 to 2018 with ten predictors. Results showed that a decision-tree based extreme gradient boosting model developed performed best among the six models. The median values of coefficient of determination (R2) and root mean squared error of six responses ranged between 0.85 and 0.94 and 8.5 to 17,776, respectively. This study provided high-quality and detailed data for industrial environmental analysis of Chinese cities. In addition, the extreme gradient boosting model could be adapted to impute the missing data for other environmental variables of other sectors and at an even smaller scale given its good generalization ability.
机构:
Univ Connecticut, Coll Liberal Arts & Sci, Dept Stat, 215 Glenbrook Rd Unit, Storrs, CT 06269 USAUniv Connecticut, Coll Liberal Arts & Sci, Dept Stat, 215 Glenbrook Rd Unit, Storrs, CT 06269 USA
机构:
Univ Connecticut, Coll Liberal Arts & Sci, Dept Stat, 215 Glenbrook Rd Unit, Storrs, CT 06269 USAUniv Connecticut, Coll Liberal Arts & Sci, Dept Stat, 215 Glenbrook Rd Unit, Storrs, CT 06269 USA