Perspectives on making big data analytics work for oncology

被引:27
作者
El Naqa, Issam [1 ]
机构
[1] Univ Michigan, Dept Radiat Oncol, Ann Arbor, MI 48109 USA
关键词
Big data; Oncology; Machine learning; Clinical decision support; PREDICT RADIATION PNEUMONITIS; DOSE-VOLUME; BAYESIAN NETWORK; NEURAL-NETWORK; RADIOTHERAPY OUTCOMES; TEXTURAL FEATURES; PROSTATE-CANCER; TUMOR RESPONSE; NECK-CANCER; FDG-PET;
D O I
10.1016/j.ymeth.2016.08.010
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Oncology, with its unique combination of clinical, physical, technological, and biological data provides an ideal case study for applying big data analytics to improve cancer treatment safety and outcomes. An oncology treatment course such as chemoradiotherapy can generate a large pool of information carrying the 5 Vs hallmarks of big data. This data is comprised of a heterogeneous mixture of patient demographics, radiationichemo dosimetry, multimodality imaging features, and biological markers generated over a treatment period that can span few days to several weeks. Efforts using commercial and in-house tools are underway to facilitate data aggregation, ontology creation, sharing, visualization and varying analytics in a secure environment. However, open questions related to proper data structure representation and effective analytics tools to support oncology decision-making need to be addressed. It is recognized that oncology data constitutes a mix of structured (tabulated) and unstructured (electronic documents) that need to be processed to facilitate searching and subsequent knowledge discovery from relational or NoSQL databases. In this context, methods based on advanced analytics and image feature extraction for oncology applications will be discussed. On the other hand, the classical p (variables) >> n (samples) inference problem of statistical learning is challenged in the Big data realm and this is particularly true for oncology applications where p-omics is witnessing exponential growth while the number of cancer incidences has generally plateaued over the past 5-years leading to a quasi-linear growth in samples per patient. Within the Big data paradigm, this kind of phenomenon may yield undesirable effects such as echo chamber anomalies, Yule-Simpson reversal paradox, or misleading ghost analytics. In this work, we will present these effects as they pertain to oncology and engage small thinking methodologies to counter these effects ranging from incorporating prior knowledge, using information-theoretic techniques to modern ensemble machine learning approaches or combination of these. We will particularly discuss the pros and cons of different approaches to improve mining of big data in oncology. (C) 2016 Elsevier Inc. All rights reserved.
引用
收藏
页码:32 / 44
页数:13
相关论文
共 84 条
  • [51] Koller D, 2009, Probabilistic graphical models: principles and techniques
  • [52] Radiomics: the process and the challenges
    Kumar, Virendra
    Gu, Yuhua
    Basu, Satrajit
    Berglund, Anders
    Eschrich, Steven A.
    Schabath, Matthew B.
    Forster, Kenneth
    Aerts, Hugo J. W. L.
    Dekker, Andre
    Fenstermacher, David
    Goldgof, Dmitry B.
    Hall, Lawrence O.
    Lambin, Philippe
    Balagurunathan, Yoganand
    Gatenby, Robert A.
    Gillies, Robert J.
    [J]. MAGNETIC RESONANCE IMAGING, 2012, 30 (09) : 1234 - 1248
  • [53] Lake P., 2015, INFORM SYSTEMS MANAG
  • [54] Radiomics: Extracting more information from medical images using advanced feature analysis
    Lambin, Philippe
    Rios-Velazquez, Emmanuel
    Leijenaar, Ralph
    Carvalho, Sara
    van Stiphout, Ruud G. P. M.
    Granton, Patrick
    Zegers, Catharina M. L.
    Gillies, Robert
    Boellard, Ronald
    Dekker, Andre
    Aerts, Hugo J. W. L.
    [J]. EUROPEAN JOURNAL OF CANCER, 2012, 48 (04) : 441 - 446
  • [55] Comprehensive genomic characterization of head and neck squamous cell carcinomas
    Lawrence, Michael S.
    Sougnez, Carrie
    Lichtenstein, Lee
    Cibulskisl, Kristian
    Lander, Eric
    Gabriel, Stacey B.
    Getz, Gad
    Ally, Adrian
    Balasundaram, Miruna
    Birol, Inanc
    Bowlby, Reanne
    Brooks, Denise
    Butterfield, Yaron S. N.
    Carlsen, Rebecca
    Cheng, Dean
    Chu, Andy
    Dhalla, Noreen
    Guin, Ranabir
    Holt, Robert A.
    Jones, Steven J. M.
    Lee, Darlene
    Li, Haiyan I.
    Marra, Marco A.
    Mayo, Michael
    Moore, Richard A.
    Mungall, Andrew J.
    Robertson, A. Gordon
    Schein, Jacqueline E.
    Sipahimalan, Payal
    Tam, Angela
    Thiessen, Nina
    Wong, Tina
    Protopopov, Alexei
    Santoso, Netty
    Lee, Semin
    Parfenov, Michael
    Zhang, Jianhua
    Mahadeshwar, Harshad S.
    Tang, Jiabin
    Ren, Xiaojia
    Seth, Sahil
    Haseley, Psalm
    Zeng, Dong
    Yang, Lixing
    Xu, Andrew W.
    Song, Xingzhi
    Pantazi, Angeliki
    Bristow, Christopher A.
    Hadjipanayis, Angela
    Seidman, Jonathan
    [J]. NATURE, 2015, 517 (7536) : 576 - 582
  • [56] Bayesian network ensemble as a multivariate strategy to predict radiation pneumonitis risk
    Lee, Sangkyu
    Ybarra, Norma
    Jeyaseelan, Krishinima
    Faria, Sergio
    Kopek, Neil
    Brisebois, Pascale
    Bradley, Jeffrey D.
    Robinson, Clifford
    Seuntjens, Jan
    El Naqa, Issam
    [J]. MEDICAL PHYSICS, 2015, 42 (05) : 2421 - 2430
  • [57] Fitting tumor control probability models to biopsy outcome after three-dimensional conformal radiation therapy of prostate cancer:: Pitfalls in deducing radiobiologic parameters for tumors from clinical data
    Levegrün, S
    Jackson, A
    Zelefsky, MJ
    Skwarchuk, MW
    Venkatraman, ES
    Schlegel, W
    Fuks, Z
    Leibel, SA
    Ling, CC
    [J]. INTERNATIONAL JOURNAL OF RADIATION ONCOLOGY BIOLOGY PHYSICS, 2001, 51 (04): : 1064 - 1080
  • [58] Dosimetric predictors of radiation-induced lung injury
    Marks, LB
    [J]. INTERNATIONAL JOURNAL OF RADIATION ONCOLOGY BIOLOGY PHYSICS, 2002, 54 (02): : 313 - 316
  • [59] A neural network to predict symptomatic lung injury
    Munley, MT
    Lo, JY
    Sibley, GS
    Bentel, GC
    Anscher, MS
    Marks, LB
    [J]. PHYSICS IN MEDICINE AND BIOLOGY, 1999, 44 (09) : 2241 - 2249
  • [60] Murray T. A., 2016, STAT MED