Mining real-world high dimensional structured data in medicine and its use in decision support. Some different perspectives on unknowns, interdependency, and distinguishability

被引:4
作者
Robson, Barry [1 ,2 ]
Boray, S. [1 ]
Weisman, J. [2 ]
机构
[1] Ingine Inc, Cleveland, OH 44106 USA
[2] Dirac Fdn, Burford, Oxon, England
关键词
Real world data; Assumptions; Approximations; Unknowns; Interdependency; Distinguishability; Coherence; Inference net; Bayes ' rule; Bayes net; Hyperbolic Dirac net; Clinical decision support; UNIVERSAL EXCHANGE; INFERENCE LANGUAGE; BAYESIAN NETWORKS; WEB; SUGGESTIONS; ACCURACY;
D O I
10.1016/j.compbiomed.2021.105118
中图分类号
Q [生物科学];
学科分类号
07 ; 0710 ; 09 ;
摘要
There are many difficulties in extracting and using knowledge for medical analytic and predictive purposes from Real-World Data, even when the data is already well structured in the manner of a large spreadsheet. Preparative curation and standardization or "normalization" of such data involves a variety of chores but underlying them is an interrelated set of fundamental problems that can in part be dealt with automatically during the datamining and inference processes. These fundamental problems are reviewed here and illustrated and investigated with examples. They concern the treatment of unknowns, the need to avoid independency assumptions, and the appearance of entries that may not be fully distinguished from each other. Unknowns include errors detected as implausible (e.g., out of range) values that are subsequently converted to unknowns. These problems are further impacted by high dimensionality and problems of sparse data that inevitably arise from high-dimensional datamining even if the data is extensive. All these considerations are different aspects of incomplete information, though they also relate to problems that arise if care is not taken to avoid or ameliorate consequences of including the same information twice or more, or if misleading or inconsistent information is combined. This paper addresses these aspects from a slightly different perspective using the Q-UEL language and inference methods based on it by borrowing some ideas from the mathematics of quantum mechanics and information theory. It takes the view that detection and correction of probabilistic elements of knowledge subsequently used in inference need only involve testing and correction so that they satisfy certain extended notions of coherence between probabilities. This is by no means the only possible view, and it is explored here and later compared with a related notion of consistency.
引用
收藏
页数:27
相关论文
共 130 条
  • [1] [Anonymous], IN NAT PRIOR COMP EF, DOI [10.17226/12648, DOI 10.17226/12648]
  • [2] [Anonymous], 2020, CLIN QUAL LANG REL 1
  • [3] [Anonymous], 2020, FHIR GUIDE TO DESIGN
  • [4] [Anonymous], 2011, HLTHCARE IT NEWS
  • [5] Clinical decision support system: Risk level prediction of heart disease using weighted fuzzy rules
    Anooj, P. K.
    [J]. JOURNAL OF KING SAUD UNIVERSITY-COMPUTER AND INFORMATION SCIENCES, 2012, 24 (01) : 27 - 40
  • [6] Asano Masanari, 2012, Quantum Interaction. 6th International Symposium, QI 2012. Revised Selected Papers, P138, DOI 10.1007/978-3-642-35659-9_13
  • [7] Graph theoretic modeling of large-scale semantic networks
    Bales, Michael E.
    Johnson, Stephen B.
    [J]. JOURNAL OF BIOMEDICAL INFORMATICS, 2006, 39 (04) : 451 - 464
  • [8] Banerjee A, 2009, Ind Psychiatry J, V18, P64, DOI 10.4103/0972-6748.57864
  • [9] Bayes T., 1763, A. M. F. R. S. Philos. Trans, V53, P370, DOI DOI 10.1098/RSTL.1763.0053
  • [10] Bishop C.M., 2006, Neural networks for pattern recognition