MLBCD: A machine learning tool for big clinical data

被引:24
作者
Luo G. [1 ]
机构
[1] Department of Biomedical Informatics, University of Utah, Suite 140, 421 Wakara Way, Salt Lake City, 84108, UT
关键词
Automatic algorithm selection; Automatic hyper-parameter value selection; Big clinical data; Entity-attribute-value; Machine learning; Pivot;
D O I
10.1186/s13755-015-0011-0
中图分类号
学科分类号
摘要
Background: Predictive modeling is fundamental for extracting value from large clinical data sets, or "big clinical data," advancing clinical research, and improving healthcare. Machine learning is a powerful approach to predictive modeling. Two factors make machine learning challenging for healthcare researchers. First, before training a machine learning model, the values of one or more model parameters called hyper-parameters must typically be specified. Due to their inexperience with machine learning, it is hard for healthcare researchers to choose an appropriate algorithm and hyper-parameter values. Second, many clinical data are stored in a special format. These data must be iteratively transformed into the relational table format before conducting predictive modeling. This transformation is time-consuming and requires computing expertise. Methods: This paper presents our vision for and design of MLBCD (Machine Learning for Big Clinical Data), a new software system aiming to address these challenges and facilitate building machine learning predictive models using big clinical data. Results: The paper describes MLBCD's design in detail. Conclusions: By making machine learning accessible to healthcare researchers, MLBCD will open the use of big clinical data and increase the ability to foster biomedical discovery and improve care. © 2015 Luo.
引用
收藏
相关论文
共 138 条
  • [1] Steyerberg E.W., Clinical prediction models a practical approach to development, validation, and updating, (2009)
  • [2] Kuhn M., Johnson K., Applied predictive modeling., (2013)
  • [3] Axelrod R.C., Vogel D., Predictive modeling in health plans, Dis Manag Health Outcomes., 11, 12, pp. 779-787, (2003)
  • [4] Asadi H., Dowling R., Yan B., Mitchell P., Machine learning for outcome prediction of acute ischemic stroke post intra-arterial therapy, PLoS One., 9, 2, (2014)
  • [5] Witten I.H., Frank E., Hall M.A., Data mining practical machine learning tools and techniques, (2011)
  • [6] Jovic A., Brkic K., Bogunovic N., An overview of free software tools for general data mining, In: Proceedings of MIPRO, pp. 1112-1117, (2014)
  • [7] Kraska T., Talwalkar A., Duchi J.C., Griffith R., Franklin M.J., Jordan M.I., MLbase a distributed machine-learning system, (2013)
  • [8] Thornton C., Hutter F., Hoos H.H., Leyton-Brown K., Auto-WEKA: combined selection and hyperparameter optimization of classification algorithms, In: Proceedings of KDD, pp. 847-855, (2013)
  • [9] Petrak J., Fast subsampling performance estimates for classification algorithm selection., In: Proceedings in ECML Workshop on Meta-Learning: Building Automatic Advice Strategies for Model Selection and Method Combination, pp. 3-14, (2000)
  • [10] Snoek J., Larochelle H., Adams R.P., Practical Bayesian optimization of machine learning algorithms, In: Proceedings of NIPS, pp. 2960-2968, (2012)