The "Gene Cube": A Novel Approach to Three-dimensional Clustering of Gene Expression Data

被引：9

作者：

Lambrou, George I. ^{[1
,2
]}

Sdraka, Maria ^{[2
]}

Koutsouris, Dimitrios ^{[2
]}

机构：

[1] Natl & Kapodistrian Univ Athens, Dept Pediat 1, Choremeio Res Lab, Thivon & Levadeias 8, Athens 11527, Greece

[2] Natl & Kapodistrian Univ Athens, Sch Elect & Comp Engn, Biomed Engn Lab, Heroon Polytecniou 9, Athens 15780, Greece

来源：

CURRENT BIOINFORMATICS | 2019年 / 14卷 / 08期

关键词：

Machine learning; clustering; chromosomes; gene expression; DNA microarrays; algorithms; CHROMOSOMAL DOMAINS; MICROARRAY; ALGORITHM; ARRAY;

D O I：

10.2174/1574893614666190116170406

中图分类号：

Q5 [生物化学];

学科分类号：

071010 ; 081704 ;

摘要：

Background: A very popular technique for isolating significant genes from cancerous tissues is the application of various clustering algorithms on data obtained by DNA microarray experiments. Aim: The objective of the present work is to take into consideration the chromosomal identity of every gene before the clustering, by creating a three-dimensional structure of the form ChromosomesxGenesxSamples. Further on, the k-Means algorithm and a triclustering technique called delta-TRIMAX, are applied independently on the structure. Materials and Methods: The present algorithm was developed using the Python programming language (v. 3.5.1). For this work, we used two distinct public datasets containing healthy control samples and tissue samples from bladder cancer patients. Background correction was performed by subtracting the median global background from the median local Background from the signal intensity. The quantile normalization method has been applied for sample normalization. Three known algorithms have been applied for testing the "gene cube", a classical k-means, a transformed 3D k-means and the delta-TRIMAX. Results: Our proposed data structure consists of a 3D matrix of the form ChromosomesxGenesxSamples. Clustering analysis of that structure manifested very good results as we were able to identify gene expression patterns among samples, genes and chromosomes. Discussion: to the best of our knowledge, this is the first time that such a structure is reported and it consists of a useful tool towards gene classification from high-throughput gene expression experiments. Conclusions: Such approaches could prove useful towards the understanding of disease mechanics and tumors in particular.

引用

页码：721 / 727

页数：7

共 53 条

[21] Spectral biclustering of microarray data: Coclustering genes and conditions
Kluger, Y
Basri, R
Chang, JT
Gerstein, M
[J]. GENOME RESEARCH, 2003, 13 (04) : 703 - 716
[22] Kollegal M, 2005, CSB WORKSH STANF CA
[23] Lambrou GI, 2012, SYSTEMS BIOL METHODO, P111
[24] Li A, 2009, GENE REGUL SYST BIO, V3, P49
[25] LLOYD SP, 1982, IEEE T INFORM THEORY, V28, P129, DOI 10.1109/TIT.1982.1056489
[26] MacQueen J., 1967, PROC 5 BERKELEY S MA, V1, P281
[27] Biclustering algorithms for biological data analysis: A survey
Madeira, SC
Oliveira, AL
[J]. IEEE-ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS, 2004, 1 (01) : 24 - 45
[28] Mahanta P, 2011, EMERGING TRENDS APPL
[29] Malarvizhi M. R., 2012, Int. J. Eng. Res. Dev, V5, P5
[30] Biclustering Three-Dimensional Data Arrays With Plaid Models
Mankad, Shawn
Michailidis, George
[J]. JOURNAL OF COMPUTATIONAL AND GRAPHICAL STATISTICS, 2014, 23 (04) : 943 - 965

← 1 2 3 4 5 6 →