A Python']Python Clustering Analysis Protocol of Genes Expression Data Sets

被引:7
作者
Agapito, Giuseppe [1 ,2 ]
Milano, Marianna [2 ,3 ]
Cannataro, Mario [2 ,3 ]
机构
[1] Univ Magna Grcia Catanzaro, Dept Law Econ & Social Sci, I-88100 Catanzaro, Italy
[2] Magna Graecia Univ Catanzaro, Data Analyt Res Ctr, I-88100 Catanzaro, Italy
[3] Magna Graecia Univ Catanzaro, Dept Med & Clin Surg, I-88100 Catanzaro, Italy
关键词
data mining; unsupervised learning; clustering; microarrays; SNPs; DEGs; MICROARRAY DATA; NCBI GEO; PATHWAY; IDENTIFICATION; PROFILES; MILLIONS; TOOLS;
D O I
10.3390/genes13101839
中图分类号
Q3 [遗传学];
学科分类号
071007 ; 090102 ;
摘要
Gene expression and SNPs data hold great potential for a new understanding of disease prognosis, drug sensitivity, and toxicity evaluations. Cluster analysis is used to analyze data that do not contain any specific subgroups. The goal is to use the data itself to recognize meaningful and informative subgroups. In addition, cluster investigation helps data reduction purposes, exposes hidden patterns, and generates hypotheses regarding the relationship between genes and phenotypes. Cluster analysis could also be used to identify bio-markers and yield computational predictive models. The methods used to analyze microarrays data can profoundly influence the interpretation of the results. Therefore, a basic understanding of these computational tools is necessary for optimal experimental design and meaningful data analysis. This manuscript provides an analysis protocol to effectively analyze gene expression data sets through the K-means and DBSCAN algorithms. The general protocol enables analyzing omics data to identify subsets of features with low redundancy and high robustness, speeding up the identification of new bio-markers through pathway enrichment analysis. In addition, to demonstrate the effectiveness of our clustering analysis protocol, we analyze a real data set from the GEO database. Finally, the manuscript provides some best practice and tips to overcome some issues in the analysis of omics data sets through unsupervised learning.
引用
收藏
页数:22
相关论文
共 70 条
[1]   Identification of Pathogenic Viruses Using Genomic Cepstral Coefficients with Radial Basis Function Neural Network [J].
Adetiba, Emmanuel ;
Olugbara, Oludayo O. ;
Taiwo, Tunmike B. .
ADVANCES IN NATURE AND BIOLOGICALLY INSPIRED COMPUTING, 2016, 419 :281-291
[2]   Parallel Network Analysis and Communities Detection (PANC) Pipeline for the Analysis and Visualization of COVID-19 Data [J].
Agapito, Giuseppe ;
Milano, Marianna ;
Cannataro, Mario .
PARALLEL PROCESSING LETTERS, 2022, 32 (01N02)
[3]   cPEA: a parallel method to perform pathway enrichment analysis using multiple pathways databases [J].
Agapito, Giuseppe ;
Cannataro, Mario .
SOFT COMPUTING, 2020, 24 (23) :17561-17572
[4]   BioPAX-Parser: parsing and enrichment analysis of BioPAX pathways [J].
Agapito, Giuseppe ;
Pastrello, Chiara ;
Guzzi, Pietro Hiram ;
Jurisica, Igor ;
Cannataro, Mario .
BIOINFORMATICS, 2020, 36 (15) :4377-4378
[5]   DMET-Miner: Efficient discovery of association rules from pharmacogenomic data [J].
Agapito, Giuseppe ;
Guzzi, Pietro H. ;
Cannataro, Mario .
JOURNAL OF BIOMEDICAL INFORMATICS, 2015, 56 :273-283
[6]   Sorensen-Dice Similarity Indexing based Weighted Iterative Clustering for Big Data Analytics [J].
Annathurai, KalyanaSaravanan ;
Angamuthu, Tamilarasi .
INTERNATIONAL ARAB JOURNAL OF INFORMATION TECHNOLOGY, 2022, 19 (01) :11-22
[7]  
[Anonymous], 2011, P 2011 INT C ENV BIO
[8]   Polymorphic Variants in NR1I3 and UGT2B7 Predict Taxane Neurotoxicity and Have Prognostic Relevance in Patients With Breast Cancer: A Case-Control Study [J].
Arbitrio, Mariamena ;
Scionti, Francesca ;
Altomare, Emanuela ;
Di Martino, Maria Teresa ;
Agapito, Giuseppe ;
Galeano, Teresa ;
Staropoli, Nicoletta ;
Iuliano, Eleonora ;
Grillone, Francesco ;
Fabiani, Fernanda ;
Caracciolo, Daniele ;
Cannataro, Mario ;
Arpino, Grazia ;
Santini, Daniele ;
Tassone, Pierfrancesco ;
Tagliaferri, Pierosandro .
CLINICAL PHARMACOLOGY & THERAPEUTICS, 2019, 106 (02) :422-431
[9]   DMET™ (Drug Metabolism Enzymes and Transporters): a Pharmacogenomic platform for precision medicine [J].
Arbitrio, Mariamena ;
Di Martino, Maria Teresa ;
Scionti, Francesca ;
Agapito, Giuseppe ;
Guzzi, Pietro Hiram ;
Cannataro, Mario ;
Tassone, Pierfrancesco ;
Tagliaferri, Pierosandro .
ONCOTARGET, 2016, 7 (33) :54028-54050
[10]   Identification of polymorphic variants associated with erlotinib-related skin toxicity in advanced non-small cell lung cancer patients by DMET microarray analysis [J].
Arbitrio, Mariamena ;
Di Martino, Maria Teresa ;
Barbieri, Vito ;
Agapito, Giuseppe ;
Guzzi, Pietro Hiram ;
Botta, Cirino ;
Iuliano, Eleonora ;
Scionti, Francesca ;
Altomare, Emanuela ;
Codispoti, Stefania ;
Conforti, Serafino ;
Cannataro, Mario ;
Tassone, Pierfrancesco ;
Tagliaferri, Pierosandro .
CANCER CHEMOTHERAPY AND PHARMACOLOGY, 2016, 77 (01) :205-209