CORAZON: a web server for data normalization and unsupervised clustering based on expression profiles

被引:1
作者
Ramos, Thais A. R. [1 ,2 ]
Maracaja-Coutinho, Vinicius [1 ,2 ,3 ]
Ortega, J. Miguel [4 ]
do Rego, Thais G. [1 ,5 ]
机构
[1] Univ Fed Rio Grande do Norte, Programa Posgrad Bioinformat, Bioinformat Multidisciplinary Environm BioME, Inst Metropole Digital, Natal, RN, Brazil
[2] Univ Chile, Fac Ciencias Quim & Farmaceut, Adv Ctr Chron Dis ACCDiS, Santiago, Chile
[3] Inst Vand, Joao Pessoa, Paraiba, Brazil
[4] Univ Fed Minas Gerais, Inst Ciencias Biol, Dept Bioquim & Imunol, Belo Horizonte, MG, Brazil
[5] Univ Fed Paraiba, Dept Informat, Ctr Informat, Joao Pessoa, Paraiba, Brazil
关键词
Gene expression; Machine learning; Clustering; Normalization; Expression profiling; Transcriptome analysis; Non-coding RNAs; Web server; NONCODING RNAS; PATTERNS; SEQ;
D O I
10.1186/s13104-020-05171-6
中图分类号
Q [生物科学];
学科分类号
07 ; 0710 ; 09 ;
摘要
Objective Data normalization and clustering are mandatory steps in gene expression and downstream analyses, respectively. However, user-friendly implementations of these methodologies are available exclusively under expensive licensing agreements, or in stand-alone scripts developed, reflecting on a great obstacle for users with less computational skills. Results We developed an online tool called CORAZON (Correlations Analyses Zipper Online), which implements three unsupervised learning methods to cluster gene expression datasets in a friendly environment. It allows the usage of eight gene expression normalization/transformation methodologies and the attribute's influence. The normalizations requiring the gene length only could be performed to RNA-seq, meanwhile the others can be used with microarray and/or NanoString data. Clustering methodologies performances were evaluated through five models with accuracies between 92 and 100%. We applied our tool to obtain functional insights of non-coding RNAs (ncRNAs) based on Gene Ontology enrichment of clusters in a dataset generated by the ENCODE project. The clusters where the majority of transcripts are coding genes were enriched in Cellular, Metabolic, Transports, and Systems Development categories. Meanwhile, the ncRNAs were enriched in the Detection of Stimulus, Sensory Perception, Immunological System, and Digestion categories. CORAZON source-code is freely available atand the web-server can be accessed at http://corazon.integrativebioinformatics.me.
引用
收藏
页数:7
相关论文
共 44 条
[1]  
Aloisio G., 2005, BIOL ARTIFICIAL INTE
[2]   Differential expression analysis for sequence count data [J].
Anders, Simon ;
Huber, Wolfgang .
GENOME BIOLOGY, 2010, 11 (10)
[3]  
[Anonymous], 2014, B CANC, DOI DOI 10.1684/bdc.2014.2023
[4]  
Chakraborty I., 2017, J INF TECHNOL SOFTW, V7, P207, DOI [10.4172/2165-7866.1000207, DOI 10.4172/2165-7866.1000207]
[5]   Gene regulation in the immune system by long noncoding RNAs [J].
Chen, Y. Grace ;
Satpathy, Ansuman T. ;
Chang, Howard Y. .
NATURE IMMUNOLOGY, 2017, 18 (09) :962-972
[6]   Genomewide analysis of mRNA processing in yeast using splicing-specific microarrays [J].
Clark, TA ;
Sugnet, CW ;
Ares, M .
SCIENCE, 2002, 296 (5569) :907-910
[7]   How does gene expression clustering work? [J].
D'haeseleer, P .
NATURE BIOTECHNOLOGY, 2005, 23 (12) :1499-1501
[8]   A Novel Method to Predict Genomic Islands Based on Mean Shift Clustering Algorithm [J].
de Brito, Daniel M. ;
Maracaja-Coutinho, Vinicius ;
de Farias, Savio T. ;
Batista, Leonardo V. ;
do Rego, Thais G. .
PLOS ONE, 2016, 11 (01)
[9]   Normalization of gene expression measurements in tumor tissues: comparison of 13 endogenous control genes [J].
de Kok, JB ;
Roelofs, RW ;
Giesendorf, BA ;
Pennings, JL ;
Waas, ET ;
Feuth, T ;
Swinkels, DW ;
Span, PN .
LABORATORY INVESTIGATION, 2005, 85 (01) :154-159
[10]  
Dudoit S, 2002, GENOME BIOL, V3