Machine Learning Outperforms Regression Analysis to Predict Next-Season Major League Baseball Player Injuries: Epidemiology and Validation of 13,982 Player-Years From Performance and Injury Profile Trends, 2000-2017

被引:43
作者
Karnuta, Jaret M. [1 ,2 ]
Luu, Bryan C. [1 ,3 ]
Haeberle, Heather S. [1 ,2 ,3 ]
Saluan, Paul M. [1 ,2 ]
Frangiamore, Salvatore J. [1 ,2 ]
Stearns, Kim L. [1 ,2 ]
Farrow, Lutul D. [1 ,2 ]
Nwachukwu, Benedict U. [1 ,4 ]
Verma, Nikhil N. [1 ,5 ]
Makhni, Eric C. [1 ]
Schickendantz, Mark S. [1 ,2 ]
Ramkumar, Prem N. [1 ,2 ]
机构
[1] Cleveland Clin, 9500 Euclid Ave, Cleveland, OH 44106 USA
[2] Cleveland Clin, Orthopaed Machine Learning Lab, Cleveland, OH 44106 USA
[3] Baylor Coll Med, Dept Orthoped Surg, Houston, TX 77030 USA
[4] Hosp Special Surg, 535 E 70th St, New York, NY 10021 USA
[5] Henry Ford Hlth Syst, Dept Orthoped, West Bloomfield, MI USA
关键词
machine learning; injury prediction; injury prevention;
D O I
10.1177/2325967120963046
中图分类号
R826.8 [整形外科学]; R782.2 [口腔颌面部整形外科学]; R726.2 [小儿整形外科学]; R62 [整形外科学(修复外科学)];
学科分类号
摘要
Background: Machine learning (ML) allows for the development of a predictive algorithm capable of imbibing historical data on a Major League Baseball (MLB) player to accurately project the player's future availability. Purpose: To determine the validity of an ML model in predicting the next-season injury risk and anatomic injury location for both position players and pitchers in the MLB. Study Design: Descriptive epidemiology study. Methods: Using 4 online baseball databases, we compiled MLB player data, including age, performance metrics, and injury history. A total of 84 ML algorithms were developed. The output of each algorithm reported whether the player would sustain an injury the following season as well as the injury's anatomic site. The area under the receiver operating characteristic curve (AUC) primarily determined validation. Results: Player data were generated from 1931 position players and 1245 pitchers, with a mean follow-up of 4.40 years (13,982 player-years) between the years of 2000 and 2017. Injured players spent a total of 108,656 days on the disabled list, with a mean of 34.21 total days per player. The mean AUC for predicting next-season injuries was 0.76 among position players and 0.65 among pitchers using the top 3 ensemble classification. Back injuries had the highest AUC among both position players and pitchers, at 0.73. Advanced ML models outperformed logistic regression in 13 of 14 cases. Conclusion: Advanced ML models generally outperformed logistic regression and demonstrated fair capability in predicting publicly reportable next-season injuries, including the anatomic region for position players, although not for pitchers.
引用
收藏
页数:10
相关论文
共 28 条
[1]   Major and Minor League Baseball Hamstring Injuries Epidemiologic Findings From the Major League Baseball Injury Surveillance System [J].
Ahmad, Christopher S. ;
Dick, Randall W. ;
Snell, Edward ;
Kenney, Nick D. ;
Curriero, Frank C. ;
Pollack, Keshia ;
Albright, John P. ;
Mandelbaum, Bert R. .
AMERICAN JOURNAL OF SPORTS MEDICINE, 2014, 42 (06) :1464-1470
[2]  
Andrew G., 2007, P 24 INT C MACHINE L, P33, DOI 10.1145/1273496.1273501
[3]  
Baseball Savant, 2019, TREND MLB PLAYERS ST
[4]  
Baseball-Reference.com, MLB STATS SCOR HIST
[5]  
Batista GEAPA, 2004, ACM SIGKDD Explor Newsl, V6, P20, DOI [10.1145/1007730.1007735, DOI 10.1145/1007730.1007735]
[6]   Big Data and Machine Learning in Health Care [J].
Beam, Andrew L. ;
Kohane, Isaac S. .
JAMA-JOURNAL OF THE AMERICAN MEDICAL ASSOCIATION, 2018, 319 (13) :1317-1318
[7]   The Effect of Regular-Season Rest on Playoff Performance Among Players in the National Basketball Association [J].
Belk, John W. ;
Marshall, Hayden A. ;
McCarty, Eric C. ;
Kraeutler, Matthew J. .
ORTHOPAEDIC JOURNAL OF SPORTS MEDICINE, 2017, 5 (10)
[8]   Artificial Intelligence, Machine Learning, Deep Learning, and Cognitive Computing: What Do These Terms Mean and How Will They Impact Health Care? [J].
Bini, Stefano A. .
JOURNAL OF ARTHROPLASTY, 2018, 33 (08) :2358-2361
[9]   SMOTE for high-dimensional class-imbalanced data [J].
Blagus, Rok ;
Lusa, Lara .
BMC BIOINFORMATICS, 2013, 14
[10]   SMOTE: Synthetic minority over-sampling technique [J].
Chawla, Nitesh V. ;
Bowyer, Kevin W. ;
Hall, Lawrence O. ;
Kegelmeyer, W. Philip .
2002, American Association for Artificial Intelligence (16)