Random Forests for Global and Regional Crop Yield Predictions

被引:379
作者
Jeong, Jig Han [1 ]
Resop, Jonathan P. [2 ,3 ]
Mueller, Nathaniel D. [4 ,5 ]
Fleisher, David H. [3 ]
Yun, Kyungdahm [1 ]
Butler, Ethan E. [6 ]
Timlin, Dennis J. [3 ]
Shim, Kyo-Moon [7 ]
Gerber, James S. [8 ]
Reddy, Vangimalla R. [3 ]
Kim, Soo-Hyung [1 ]
机构
[1] Univ Washington, Sch Environm & Forest Sci, Coll Environm, Box 354115, Seattle, WA 98195 USA
[2] Univ Maryland, Dept Geog Sci, College Pk, MD 20742 USA
[3] USDA ARS, Crop Syst & Global Change Lab, Beltsville, MD 20705 USA
[4] Harvard Univ, Dept Earth & Planetary Sci, 20 Oxford St, Cambridge, MA 02138 USA
[5] Harvard Univ, Dept Organism & Evolutionary Biol, Cambridge, MA 02138 USA
[6] Univ Minnesota, Dept Forest Resources, St Paul, MN 55108 USA
[7] RDA, Climate Change & Agroecol Div, Natl Inst Agr Sci, Suwon, South Korea
[8] Univ Minnesota, Inst Environm, St Paul, MN 55108 USA
来源
PLOS ONE | 2016年 / 11卷 / 06期
基金
美国国家科学基金会;
关键词
RANGE SHIFTS; CORN YIELDS; REGRESSION; CLASSIFICATION; TEMPERATURE; MODELS; RESPONSES; BIOMASS;
D O I
10.1371/journal.pone.0156571
中图分类号
O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];
学科分类号
07 ; 0710 ; 09 ;
摘要
Accurate predictions of crop yield are critical for developing effective agricultural and food policies at the regional and global scales. We evaluated a machine-learning method, Random Forests (RF), for its ability to predict crop yield responses to climate and biophysical variables at global and regional scales in wheat, maize, and potato in comparison with multiple linear regressions (MLR) serving as a benchmark. We used crop yield data from various sources and regions for model training and testing: 1) gridded global wheat grain yield, 2) maize grain yield from US counties over thirty years, and 3) potato tuber and maize silage yield from the northeastern seaboard region. RF was found highly capable of predicting crop yields and outperformed MLR benchmarks in all performance statistics that were compared. For example, the root mean square errors (RMSE) ranged between 6 and 14% of the average observed yield with RF models in all test cases whereas these values ranged from 14% to 49% for MLR models. Our results show that RF is an effective and versatile machine-learning method for crop yield predictions at regional and global scales for its high accuracy and precision, ease of use, and utility in data analysis. RF may result in a loss of accuracy when predicting the extreme ends or responses beyond the boundaries of the training data.
引用
收藏
页数:15
相关论文
共 50 条
  • [1] [Anonymous], 2010, R LANG ENV STAT COMP
  • [2] [Anonymous], CENS AGR
  • [3] [Anonymous], WORKING DYNAMIC CROP
  • [4] [Anonymous], 2010 CROPL DAT LAYER
  • [5] [Anonymous], S WHEATS MOR TROP EN
  • [6] [Anonymous], 2008, GLOBAL BIOGEOCHEMICA
  • [7] [Anonymous], CENS AGR
  • [8] Berk RA, 2008, SPRINGER SER STAT, P1, DOI 10.1007/978-0-387-77501-2_1
  • [9] Random forests
    Breiman, L
    [J]. MACHINE LEARNING, 2001, 45 (01) : 5 - 32
  • [10] Statistical modeling: The two cultures
    Breiman, L
    [J]. STATISTICAL SCIENCE, 2001, 16 (03) : 199 - 215