Identification and epidemiological characterization of Type-2 diabetes sub-population using an unsupervised machine learning approach

被引:20
作者
Bej, Saptarshi [1 ,2 ]
Sarkar, Jit [3 ,4 ]
Biswas, Saikat [5 ]
Mitra, Pabitra [6 ]
Chakrabarti, Partha [3 ,4 ]
Wolkenhauer, Olaf [1 ,2 ,7 ]
机构
[1] Univ Rostock, Dept Syst Biol & Bioinformat, Rostock, Germany
[2] Tech Univ Munich, Leibniz Inst Food Syst Biol, Munich, Germany
[3] CSIR Indian Inst Chem Biol, Div Cell Biol & Physiol, Kolkata, India
[4] Acad Innovat & Sci Res, Ghaziabad, India
[5] Indian Inst Technol, Adv Technol Dev Ctr, Kharagpur, W Bengal, India
[6] Indian Inst Technol, Dept Comp Sci & Engn, Kharagpur, W Bengal, India
[7] Stellenbosch Univ, Stellenbosch Inst Adv Study STIAS, Wallenberg Res Ctr, Stellenbosch, South Africa
关键词
SOCIOECONOMIC POSITION; FOOD GROUPS; FOLLOW-UP; MELLITUS; RISK; ASSOCIATION; MEN;
D O I
10.1038/s41387-022-00206-2
中图分类号
R5 [内科学];
学科分类号
1002 ; 100201 ;
摘要
Background Studies on Type-2 Diabetes Mellitus (T2DM) have revealed heterogeneous sub-populations in terms of underlying pathologies. However, the identification of sub-populations in epidemiological datasets remains unexplored. We here focus on the detection of T2DM clusters in epidemiological data, specifically analysing the National Family Health Survey-4 (NFHS-4) dataset from India containing a wide spectrum of features, including medical history, dietary and addiction habits, socio-economic and lifestyle patterns of 10,125 T2DM patients. Methods Epidemiological data provide challenges for analysis due to the diverse types of features in it. In this case, applying the state-of-the-art dimension reduction tool UMAP conventionally was found to be ineffective for the NFHS-4 dataset, which contains diverse feature types. We implemented a distributed clustering workflow combining different similarity measure settings of UMAP, for clustering continuous, ordinal and nominal features separately. We integrated the reduced dimensions from each feature-type-distributed clustering to obtain interpretable and unbiased clustering of the data. Results Our analysis reveals four significant clusters, with two of them comprising mainly of non-obese T2DM patients. These non-obese clusters have lower mean age and majorly comprises of rural residents. Surprisingly, one of the obese clusters had 90% of the T2DM patients practising a non-vegetarian diet though they did not show an increased intake of plant-based protein-rich foods. Conclusions From a methodological perspective, we show that for diverse data types, frequent in epidemiological datasets, feature-type-distributed clustering using UMAP is effective as opposed to the conventional use of the UMAP algorithm. The application of UMAP-based clustering workflow for this type of dataset is novel in itself. Our findings demonstrate the presence of heterogeneity among Indian T2DM patients with regard to socio-demography and dietary patterns. From our analysis, we conclude that the existence of significant non-obese T2DM sub-populations characterized by younger age groups and economic disadvantage raises the need for different screening criteria for T2DM among rural Indian residents.
引用
收藏
页数:11
相关论文
共 34 条
[1]   Socio-economic position at the point in life in association with type 2 diabetes and impaired glucose tolerance in middle-eged Swedish men and women [J].
Agardh, E. E. ;
Ahlbom, A. ;
Andersson, T. ;
Efendic, S. ;
Grill, V. ;
Hallqvist, J. ;
Ostenson, C. G. .
INTERNATIONAL JOURNAL OF EPIDEMIOLOGY, 2007, 36 (01) :84-92
[2]   Type 2 diabetes incidence and socio-economic position: a systematic review and meta-analysis [J].
Agardh, Emilie ;
Allebeck, Peter ;
Hallqvist, Johan ;
Moradi, Tahereh ;
Sidorchuk, Anna .
INTERNATIONAL JOURNAL OF EPIDEMIOLOGY, 2011, 40 (03) :804-818
[3]   Novel subgroups of adult-onset diabetes and their association with outcomes: a data-driven cluster analysis of six variables [J].
Ahlqvist, Emma ;
Storm, Petter ;
Karajamaki, Annemari ;
Martinell, Mats ;
Dorkhan, Mozhgan ;
Carlsson, Annelie ;
Vikman, Petter ;
Prasad, Rashmi B. ;
Aly, Dina Mansour ;
Almgren, Peter ;
Wessman, Ylva ;
Shaat, Nael ;
Spegel, Peter ;
Mulder, Hindrik ;
Lindholm, Eero ;
Melander, Olle ;
Hansson, Ola ;
Malmqvist, Ulf ;
Lernmark, Ake ;
Lahti, Kaj ;
Forsen, Tom ;
Tuomi, Tiinamaija ;
Rosengren, Anders H. ;
Groop, Leif .
LANCET DIABETES & ENDOCRINOLOGY, 2018, 6 (05) :361-369
[4]   Novel subgroups of type 2 diabetes and their association with microvascular outcomes in an Asian Indian population: a data-driven cluster analysis: the INSPIRED study [J].
Anjana, Ranjit Mohan ;
Baskar, Viswanathan ;
Nair, Anand Thakarakkattil Narayanan ;
Jebarani, Saravanan ;
Siddiqui, Moneeza Kalhan ;
Pradeepa, Rajendra ;
Unnikrishnan, Ranjit ;
Palmer, Colin ;
Pearson, Ewan ;
Mohan, Viswanathan .
BMJ OPEN DIABETES RESEARCH & CARE, 2020, 8 (01)
[5]   Real-world evidence of glycemic control among patients with type 2 diabetes mellitus in India: the TIGHT study [J].
Borgharkar, Surendra S. ;
Das, Soma S. .
BMJ OPEN DIABETES RESEARCH & CARE, 2019, 7 (01)
[6]   Alcohol consumption and the incidence of type 2 diabetes - A 20-year follow-up of the Finnish Twin Cohort Study [J].
Carlsson, S ;
Hammar, N ;
Grill, V ;
Kaprio, J .
DIABETES CARE, 2003, 26 (10) :2785-2790
[7]   The worldwide epidemiology of type 2 diabetes mellitus-present and future perspectives [J].
Chen, Lei ;
Magliano, Dianna J. ;
Zimmet, Paul Z. .
NATURE REVIEWS ENDOCRINOLOGY, 2012, 8 (04) :228-236
[8]   Identification of novel population clusters with different susceptibilities to type 2 diabetes and their impact on the prediction of diabetes [J].
Cho, Seong Beom ;
Kim, Sang Cheol ;
Chung, Myung Guen .
SCIENTIFIC REPORTS, 2019, 9 (1)
[9]   Diabetes prevalence and socioeconomic status: a population based study showing increased prevalence of type 2 diabetes mellitus in deprived areas [J].
Connolly, V ;
Unwin, N ;
Sherriff, P ;
Bilous, R ;
Kelly, W .
JOURNAL OF EPIDEMIOLOGY AND COMMUNITY HEALTH, 2000, 54 (03) :173-177
[10]   Disease progression and treatment response in data-driven subgroups of type 2 diabetes compared with models based on simple clinical features: an analysis using clinical trial data [J].
Dennis, John M. ;
Shields, Beverley M. ;
Henley, William E. ;
Jones, Angus G. ;
Hattersley, Andrew T. .
LANCET DIABETES & ENDOCRINOLOGY, 2019, 7 (06) :442-451