Exploratory Data Mining for Subgroup Cohort Discoveries and Prioritization

被引:17
作者
Liu, Danlu [1 ]
Baskett, William [2 ]
Beversdorf, David [3 ,4 ,5 ,6 ]
Shyu, Chi-Ren [7 ,8 ]
机构
[1] Univ Missouri, Dept Elect Engn & Comp Sci, Columbia, MO 65211 USA
[2] Univ Missouri, Inst Data Sci & Informat, Columbia, MO 65211 USA
[3] Univ Missouri, Dept Radiol, Columbia, MO 65211 USA
[4] Univ Missouri, Dept Neurol, Columbia, MO 65211 USA
[5] Univ Missouri, Dept Psychol Sci, Columbia, MO 65211 USA
[6] Univ Missouri, Thompson Ctr Autism & Neurodev Disorders, Columbia, MO 65211 USA
[7] Univ Missouri, Dept Elect Engn & Comp Sci, Inst Data Sci & Informat, Columbia, MO 65211 USA
[8] Univ Missouri, Sch Med, Columbia, MO 65211 USA
基金
美国国家卫生研究院; 美国国家科学基金会;
关键词
Sociology; Statistics; Data mining; Clinical trials; Informatics; Drugs; Bioinformatics; Contrast mining; exploratory mining; patient cohort identification; subgroup discovery; AUTISM SPECTRUM DISORDER; CONTRAST SET; BIG DATA; GENE; CLASSIFICATION; ASSOCIATION; EXPRESSION; RESOURCE; LANGUAGE; KIRREL3;
D O I
10.1109/JBHI.2019.2939149
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Finding small homogeneous subgroup cohorts in large heterogeneous populations is a critical process for hypothesis development in biomedical research. Concurrent computational approaches are still lacking in robust answers to the question "what hypotheses are likely to be novel and to produce clinically relevant results with well thought-out study designs?" We have developed a novel subgroup discovery method which employs a deep exploratory mining process to slice and dice thousands of potential subpopulations and prioritize potential cohorts based on their explainable contrast patterns and which may provide interventionable insights. We conducted computational experiments on both synthesized data and a clinical autism data set to assess performance quantitatively for coverage of pre-defined cohorts and qualitatively for novel knowledge discovery, respectively. We also conducted a scaling analysis using a distributed computing environment to suggest computational resource needs for when the subpopulation number increases. This work will provide a robust data-driven framework to automatically tailor potential interventions for precision health.
引用
收藏
页码:1456 / 1468
页数:13
相关论文
共 66 条
[1]  
Agrawal R., 1993, SIGMOD Record, V22, P207, DOI 10.1145/170036.170072
[2]   Adenosine A2A receptor signaling affects IL-21/IL-22 cytokines and GATA3/T-bet transcription factor expression in CD4+ T cells from a BTBR T+ Itpr3tf/J mouse model of autism [J].
Ahmad, Sheikh F. ;
Ansari, Mushtaq A. ;
Nadeem, Ahmed ;
Bakheet, Saleh A. ;
Almutairi, Mashal M. ;
Attia, Sabry M. .
JOURNAL OF NEUROIMMUNOLOGY, 2017, 311 :59-67
[3]  
Alfimova M. V., 2016, NEUROSCI BEHAV PHYSL, V47, P895
[4]  
Atzmueller M, 2006, LECT NOTES ARTIF INT, V4213, P6
[5]   Distinguishing between bioactive and modeled compound conformations through mining of emerging chemical patterns [J].
Auer, Jens ;
Bajorath, Juergen .
JOURNAL OF CHEMICAL INFORMATION AND MODELING, 2008, 48 (09) :1747-1753
[6]  
Bailey J., 2002, Principles of Data Mining and Knowledge Discovery. 6th European Conference, PKDD 2002. Proceedings (Lecture Notes in Artificial Intelligence Vol.2431), P39
[7]   AutDB: a gene reference resource for autism research [J].
Basu, Saumyendra N. ;
Kollu, Ravi ;
Banerjee-Basu, Sharmila .
NUCLEIC ACIDS RESEARCH, 2009, 37 :D832-D836
[8]   Severity of Autism is Related to Children's Language Processing [J].
Bavin, Edith L. ;
Kidd, Evan ;
Prendergast, Luke ;
Baker, Emma ;
Dissanayake, Chery ;
Prior, Margot .
AUTISM RESEARCH, 2014, 7 (06) :687-694
[9]   Detecting group differences: Mining contrast sets [J].
Bay, SD ;
Pazzani, MJ .
DATA MINING AND KNOWLEDGE DISCOVERY, 2001, 5 (03) :213-246
[10]   Phenotyping, Etiological Factors, and Biomarkers: Toward Precision Medicine in Autism Spectrum Disorders [J].
Beversdorf, David Q. .
JOURNAL OF DEVELOPMENTAL AND BEHAVIORAL PEDIATRICS, 2016, 37 (08) :659-673