Smart "Predict, then Optimize"

被引:281
作者
Elmachtoub, Adam N. [1 ,2 ]
Grigas, Paul [3 ]
机构
[1] Columbia Univ, Dept Ind Engn & Operat Res, New York, NY 10027 USA
[2] Columbia Univ, Data Sci Inst, New York, NY 10027 USA
[3] Univ Calif Berkeley, Dept Ind Engn & Operat Res, Berkeley, CA 94720 USA
基金
美国国家科学基金会;
关键词
prescriptive analytics; data-driven optimization; machine learning; linear regression; SAMPLE AVERAGE APPROXIMATION; INVERSE OPTIMIZATION; CLASSIFICATION; DRIVEN; ALGORITHMS; ANALYTICS; CARE;
D O I
10.1287/mnsc.2020.3922
中图分类号
C93 [管理学];
学科分类号
12 ; 1201 ; 1202 ; 120202 ;
摘要
Many real-world analytics problems involve two significant challenges: prediction and optimization. Because of the typically complex nature of each challenge, the standard paradigm is predict-then-optimize. By and large, machine learning tools are intended to minimize prediction error and do not account for how the predictions will be used in the downstream optimization problem. In contrast, we propose a new and very general framework, called Smart "Predict, then Optimize" (SPO), which directly leverages the optimization problem structure-that is, its objective and constraints-for designing better prediction models. A key component of our framework is the SPO loss function, which measures the decision error induced by a prediction. Training a prediction model with respect to the SPO loss is computationally challenging, and, thus, we derive, using duality theory, a convex surrogate loss function, which we call the SPO+ loss. Most importantly, we prove that the SPO+ loss is statistically consistent with respect to the SPO loss under mild conditions. Our SPO+ loss function can tractably handle any polyhedral, convex, or even mixed-integer optimization problem with a linear objective. Numerical experiments on shortest-path and portfolio-optimization problems show that the SPO framework can lead to significant improvement under the predict-then-optimize paradigm, in particular, when the prediction model being trained is misspecified. We find that linear models trained using SPO+ loss tend to dominate random-forest algorithms, even when the ground truth is highly nonlinear.
引用
收藏
页码:9 / 26
页数:19
相关论文
共 62 条
  • [1] Ahuja R.K., 1993, Network flows, DOI DOI 10.21236/ADA594171
  • [2] Business Analytics for Flexible Resource Allocation Under Random Emergencies
    Angalakudati, Mallik
    Balwani, Siddharth
    Calzada, Jorge
    Chatterjee, Bikram
    Perakis, Georgia
    Raad, Nicolas
    Uichanco, Joline
    [J]. MANAGEMENT SCIENCE, 2014, 60 (06) : 1552 - 1573
  • [3] Inverse Optimization with Noisy Data
    Aswani, Anil
    Shen, Zuo-Jun
    Siddiq, Auyon
    [J]. OPERATIONS RESEARCH, 2018, 66 (03) : 870 - 892
  • [4] Balkanski E, 2016, ADV NEURAL INFORM PR, V29, P4017
  • [5] The Limitations of Optimization from Samples
    Balkanski, Eric
    Rubinstein, Aviad
    Singer, Yaron
    [J]. STOC'17: PROCEEDINGS OF THE 49TH ANNUAL ACM SIGACT SYMPOSIUM ON THEORY OF COMPUTING, 2017, : 1016 - 1027
  • [6] The Big Data Newsvendor: Practical Insights from Machine Learning
    Ban, Gah-Yi
    Rudin, Cynthia
    [J]. OPERATIONS RESEARCH, 2019, 67 (01) : 90 - 108
  • [7] Machine Learning and Portfolio Optimization
    Ban, Gah-Yi
    El Karoui, Noureddine
    Lim, Andrew E. B.
    [J]. MANAGEMENT SCIENCE, 2018, 64 (03) : 1136 - 1154
  • [8] Convexity, classification, and risk bounds
    Bartlett, PL
    Jordan, MI
    McAuliffe, JD
    [J]. JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, 2006, 101 (473) : 138 - 156
  • [9] On the difficulty of approximately maximizing agreements
    Ben-David, S
    Eiron, N
    Long, PM
    [J]. JOURNAL OF COMPUTER AND SYSTEM SCIENCES, 2003, 66 (03) : 496 - 514
  • [10] Bertsimas D., 2006, Tutorials in Operations Research, P95, DOI [DOI 10.1287/EDUC.1063.0022, https://doi.org/10.1287/educ.1063.0022]