MLBCD: A machine learning tool for big clinical data

被引：24

作者：

Luo G. ^{[1
]}

机构：

[1] Department of Biomedical Informatics, University of Utah, Suite 140, 421 Wakara Way, Salt Lake City, 84108, UT

来源：

Health Information Science and Systems | / 3卷 / 1期

关键词：

Automatic algorithm selection; Automatic hyper-parameter value selection; Big clinical data; Entity-attribute-value; Machine learning; Pivot;

D O I：

10.1186/s13755-015-0011-0

中图分类号：

学科分类号：

摘要：

Background: Predictive modeling is fundamental for extracting value from large clinical data sets, or "big clinical data," advancing clinical research, and improving healthcare. Machine learning is a powerful approach to predictive modeling. Two factors make machine learning challenging for healthcare researchers. First, before training a machine learning model, the values of one or more model parameters called hyper-parameters must typically be specified. Due to their inexperience with machine learning, it is hard for healthcare researchers to choose an appropriate algorithm and hyper-parameter values. Second, many clinical data are stored in a special format. These data must be iteratively transformed into the relational table format before conducting predictive modeling. This transformation is time-consuming and requires computing expertise. Methods: This paper presents our vision for and design of MLBCD (Machine Learning for Big Clinical Data), a new software system aiming to address these challenges and facilitate building machine learning predictive models using big clinical data. Results: The paper describes MLBCD's design in detail. Conclusions: By making machine learning accessible to healthcare researchers, MLBCD will open the use of big clinical data and increase the ability to foster biomedical discovery and improve care. © 2015 Luo.

引用

共 138 条

[1] Steyerberg E.W., Clinical prediction models a practical approach to development, validation, and updating, (2009)
[2] Kuhn M., Johnson K., Applied predictive modeling., (2013)
[3] Axelrod R.C., Vogel D., Predictive modeling in health plans, Dis Manag Health Outcomes., 11, 12, pp. 779-787, (2003)
[4] Asadi H., Dowling R., Yan B., Mitchell P., Machine learning for outcome prediction of acute ischemic stroke post intra-arterial therapy, PLoS One., 9, 2, (2014)
[5] Witten I.H., Frank E., Hall M.A., Data mining practical machine learning tools and techniques, (2011)
[6] Jovic A., Brkic K., Bogunovic N., An overview of free software tools for general data mining, In: Proceedings of MIPRO, pp. 1112-1117, (2014)
[7] Kraska T., Talwalkar A., Duchi J.C., Griffith R., Franklin M.J., Jordan M.I., MLbase a distributed machine-learning system, (2013)
[8] Thornton C., Hutter F., Hoos H.H., Leyton-Brown K., Auto-WEKA: combined selection and hyperparameter optimization of classification algorithms, In: Proceedings of KDD, pp. 847-855, (2013)
[9] Petrak J., Fast subsampling performance estimates for classification algorithm selection., In: Proceedings in ECML Workshop on Meta-Learning: Building Automatic Advice Strategies for Model Selection and Method Combination, pp. 3-14, (2000)
[10] Snoek J., Larochelle H., Adams R.P., Practical Bayesian optimization of machine learning algorithms, In: Proceedings of NIPS, pp. 2960-2968, (2012)

← 1 2 3 4 5 6 7 8 9 10 →