Incorporating machine learning and social determinants of health indicators into prospective risk adjustment for health plan payments

被引:29
作者
Irvin, Jeremy A. [1 ]
Kondrich, Andrew A. [1 ]
Ko, Michael [2 ]
Rajpurkar, Pranav [1 ]
Haghgoo, Behzad [1 ]
Landon, Bruce E. [3 ,4 ]
Phillips, Robert [5 ]
Petterson, Stephen [6 ]
Ng, Andrew Y. [1 ]
Basu, Sanjay [4 ,7 ,8 ]
机构
[1] Stanford Univ, Dept Comp Sci, 353 Serra Mall, Stanford, CA 94305 USA
[2] Stanford Univ, Dept Stat, Stanford, CA 94305 USA
[3] Harvard Med Sch, Dept Healthcare Policy, Boston, MA 02115 USA
[4] Harvard Med Sch, Ctr Primary Care, Boston, MA 02115 USA
[5] Amer Board Family Med Fdn, Ctr Professionalism & Value Hlth Care, Lexington, KY USA
[6] Amer Acad Family Phys, Ctr Professionalism & Value Hlth Care, Leawood, KS USA
[7] Collect Hlth, Res & Analyt, San Francisco, CA USA
[8] Imperial Coll London, Sch Publ Hlth, London, England
关键词
Risk estimation; Machine learning; Social determinants of health; SOCIOECONOMIC-STATUS; LOGISTIC-REGRESSION; MEDICARE; MODELS; INCOME;
D O I
10.1186/s12889-020-08735-0
中图分类号
R1 [预防医学、卫生学];
学科分类号
1004 ; 120402 ;
摘要
BackgroundRisk adjustment models are employed to prevent adverse selection, anticipate budgetary reserve needs, and offer care management services to high-risk individuals. We aimed to address two unknowns about risk adjustment: whether machine learning (ML) and inclusion of social determinants of health (SDH) indicators improve prospective risk adjustment for health plan payments.MethodsWe employed a 2-by-2 factorial design comparing: (i) linear regression versus ML (gradient boosting) and (ii) demographics and diagnostic codes alone, versus additional ZIP code-level SDH indicators. Healthcare claims from privately-insured US adults (2016-2017), and Census data were used for analysis. Data from 1.02 million adults were used for derivation, and data from 0.26 million to assess performance. Model performance was measured using coefficient of determination (R-2), discrimination (C-statistic), and mean absolute error (MAE) for the overall population, and predictive ratio and net compensation for vulnerable subgroups. We provide 95% confidence intervals (CI) around each performance measure.ResultsLinear regression without SDH indicators achieved moderate determination (R-2 0.327, 95% CI: 0.300, 0.353), error ($6992; 95% CI: $6889, $7094), and discrimination (C-statistic 0.703; 95% CI: 0.701, 0.705). ML without SDH indicators improved all metrics (R-2 0.388; 95% CI: 0.357, 0.420; error $6637; 95% CI: $6539, $6735; C-statistic 0.717; 95% CI: 0.715, 0.718), reducing misestimation of cost by $3.5M per 10,000 members. Among people living in areas with high poverty, high wealth inequality, or high prevalence of uninsured, SDH indicators reduced underestimation of cost, improving the predictive ratio by 3% ($200/person/year).ConclusionsML improved risk adjustment models and the incorporation of SDH indicators reduced underpayment in several vulnerable populations.
引用
收藏
页数:10
相关论文
共 53 条
[1]   Accountable Health Communities - Addressing Social Needs through Medicare and Medicaid [J].
Alley, Dawn E. ;
Asomugha, Chisara N. ;
Conway, Patrick H. ;
Sanghavi, Darshak M. .
NEW ENGLAND JOURNAL OF MEDICINE, 2016, 374 (01) :8-11
[2]  
[Anonymous], 2016, 2012 2016 AM COMM SU
[3]  
[Anonymous], 2019, ARXIV190110566CSSTAT
[4]  
[Anonymous], GPU ACCELERATION LAR
[5]  
[Anonymous], 1994, An introduction to the bootstrap
[6]  
[Anonymous], NEJM CATALYST
[7]  
[Anonymous], HLTH BLACK REPORT HL
[8]  
[Anonymous], REGRESSION MODELING
[9]  
[Anonymous], NEW WAY TALK SOCIAL
[10]  
[Anonymous], 22600 NAT BUR EC RES