共 17 条
GFS: fuzzy preprocessing for effective gene expression analysis
被引:21
作者:

Belorkar, Abha
论文数: 0 引用数: 0
h-index: 0
机构:
Natl Univ Singapore, Sch Comp, 13 Comp Dr, Singapore 117417, Singapore Natl Univ Singapore, Sch Comp, 13 Comp Dr, Singapore 117417, Singapore

Wong, Limsoon
论文数: 0 引用数: 0
h-index: 0
机构:
Natl Univ Singapore, Sch Comp, 13 Comp Dr, Singapore 117417, Singapore Natl Univ Singapore, Sch Comp, 13 Comp Dr, Singapore 117417, Singapore
机构:
[1] Natl Univ Singapore, Sch Comp, 13 Comp Dr, Singapore 117417, Singapore
关键词:
Gene expression analysis;
Fuzzy scoring;
Preprocessing;
Normalization;
LEUKEMIA;
PREDICTION;
MUSCLE;
DMD;
D O I:
10.1186/s12859-016-1327-8
中图分类号:
Q5 [生物化学];
学科分类号:
071010 ;
081704 ;
摘要:
Background: Gene expression data produced on high-throughput platforms such as microarrays is susceptible to much variation that obscures useful biological information. Therefore, preprocessing data with a suitable normalization method is necessary, and has a direct and massive impact on the quality of downstream data analysis. However, it is known that standard normalization methods perform poorly, specially in the presence of substantial batch effects and heterogeneity in gene expression data. Results: We present Gene Fuzzy Score (GFS), a simple preprocessing technique, that is able to largely reduce obscuring variation while retaining useful biological information. Using four sets of publicly available datasets containing batch effects and heterogeneity, we compare GFS with three standard normalization techniques as well as raw gene expression. Each method is evaluated with respect to the quality, consistency, and biological coherence of its processed output. It is found that GFS outperforms other transformation techniques in all three aspects. Conclusion: Our approach to preprocessing is a stronger alternative to popular normalization techniques. We demonstrate that it achieves the essential goal of preprocessing - it is effective at making expression values from multiple samples comparable, even when they are from separate platforms, in independent batches, or belong to a heterogeneous phenotype.
引用
收藏
页数:16
相关论文
共 17 条
[1]
MLL translocations specify a distinct gene expression profile that distinguishes a unique leukemia
[J].
Armstrong, SA
;
Staunton, JE
;
Silverman, LB
;
Pieters, R
;
de Boer, ML
;
Minden, MD
;
Sallan, SE
;
Lander, ES
;
Golub, TR
;
Korsmeyer, SJ
.
NATURE GENETICS,
2002, 30 (01)
:41-47

Armstrong, SA
论文数: 0 引用数: 0
h-index: 0
机构: Dana Farber Canc Inst, Dept Canc Immunol & AIDS, Boston, MA 02115 USA

Staunton, JE
论文数: 0 引用数: 0
h-index: 0
机构: Dana Farber Canc Inst, Dept Canc Immunol & AIDS, Boston, MA 02115 USA

Silverman, LB
论文数: 0 引用数: 0
h-index: 0
机构: Dana Farber Canc Inst, Dept Canc Immunol & AIDS, Boston, MA 02115 USA

Pieters, R
论文数: 0 引用数: 0
h-index: 0
机构: Dana Farber Canc Inst, Dept Canc Immunol & AIDS, Boston, MA 02115 USA

de Boer, ML
论文数: 0 引用数: 0
h-index: 0
机构: Dana Farber Canc Inst, Dept Canc Immunol & AIDS, Boston, MA 02115 USA

Minden, MD
论文数: 0 引用数: 0
h-index: 0
机构: Dana Farber Canc Inst, Dept Canc Immunol & AIDS, Boston, MA 02115 USA

Sallan, SE
论文数: 0 引用数: 0
h-index: 0
机构: Dana Farber Canc Inst, Dept Canc Immunol & AIDS, Boston, MA 02115 USA

Lander, ES
论文数: 0 引用数: 0
h-index: 0
机构: Dana Farber Canc Inst, Dept Canc Immunol & AIDS, Boston, MA 02115 USA

Golub, TR
论文数: 0 引用数: 0
h-index: 0
机构: Dana Farber Canc Inst, Dept Canc Immunol & AIDS, Boston, MA 02115 USA

Korsmeyer, SJ
论文数: 0 引用数: 0
h-index: 0
机构:
Dana Farber Canc Inst, Dept Canc Immunol & AIDS, Boston, MA 02115 USA Dana Farber Canc Inst, Dept Canc Immunol & AIDS, Boston, MA 02115 USA
[2]
Quantitative proteomics signature profiling based on network contextualization
[J].
Bin Goh, Wilson Wen
;
Guo, Tiannan
;
Aebersold, Ruedi
;
Wong, Limsoon
.
BIOLOGY DIRECT,
2015, 10

Bin Goh, Wilson Wen
论文数: 0 引用数: 0
h-index: 0
机构:
Tianjin Univ, Sch Pharmaceut Sci & Technol, Tianjin 300072, Peoples R China
Harvard Univ, Sch Med, Ctr Interdisciplinary Cardiovasc Sci, Boston, MA USA
ETH, Dept Biol, Inst Mol Syst Biol, Zurich, Switzerland
Natl Univ Singapore, Sch Comp, Singapore 117548, Singapore Tianjin Univ, Sch Pharmaceut Sci & Technol, Tianjin 300072, Peoples R China

Guo, Tiannan
论文数: 0 引用数: 0
h-index: 0
机构:
ETH, Dept Biol, Inst Mol Syst Biol, Zurich, Switzerland Tianjin Univ, Sch Pharmaceut Sci & Technol, Tianjin 300072, Peoples R China

论文数: 引用数:
h-index:
机构:

Wong, Limsoon
论文数: 0 引用数: 0
h-index: 0
机构:
Natl Univ Singapore, Sch Comp, Singapore 117548, Singapore Tianjin Univ, Sch Pharmaceut Sci & Technol, Tianjin 300072, Peoples R China
[3]
Analysis of microarray data using Z score transformation
[J].
Cheadle, C
;
Vawter, MP
;
Freed, WJ
;
Becker, KG
.
JOURNAL OF MOLECULAR DIAGNOSTICS,
2003, 5 (02)
:73-81

Cheadle, C
论文数: 0 引用数: 0
h-index: 0
机构: NIA, DNA Array Unit, Res Resources Branch, NIH, Baltimore, MD 21221 USA

Vawter, MP
论文数: 0 引用数: 0
h-index: 0
机构: NIA, DNA Array Unit, Res Resources Branch, NIH, Baltimore, MD 21221 USA

Freed, WJ
论文数: 0 引用数: 0
h-index: 0
机构: NIA, DNA Array Unit, Res Resources Branch, NIH, Baltimore, MD 21221 USA

Becker, KG
论文数: 0 引用数: 0
h-index: 0
机构: NIA, DNA Array Unit, Res Resources Branch, NIH, Baltimore, MD 21221 USA
[4]
From sets to graphs: towards a realistic enrichment analysis of transcriptomic systems
[J].
Geistlinger, Ludwig
;
Csaba, Gergely
;
Kueffner, Robert
;
Mulder, Nicola
;
Zimmer, Ralf
.
BIOINFORMATICS,
2011, 27 (13)
:I366-I373

Geistlinger, Ludwig
论文数: 0 引用数: 0
h-index: 0
机构:
Univ Munich, Inst Informat, D-80333 Munich, Germany Univ Munich, Inst Informat, D-80333 Munich, Germany

Csaba, Gergely
论文数: 0 引用数: 0
h-index: 0
机构:
Univ Munich, Inst Informat, D-80333 Munich, Germany Univ Munich, Inst Informat, D-80333 Munich, Germany

Kueffner, Robert
论文数: 0 引用数: 0
h-index: 0
机构:
Univ Munich, Inst Informat, D-80333 Munich, Germany Univ Munich, Inst Informat, D-80333 Munich, Germany

Mulder, Nicola
论文数: 0 引用数: 0
h-index: 0
机构:
Univ Cape Town, Inst Infect Dis & Mol Med, Dept Clin Lab Sci, ZA-7925 Cape Town, South Africa Univ Munich, Inst Informat, D-80333 Munich, Germany

Zimmer, Ralf
论文数: 0 引用数: 0
h-index: 0
机构:
Univ Munich, Inst Informat, D-80333 Munich, Germany Univ Munich, Inst Informat, D-80333 Munich, Germany
[5]
Evaluating feature-selection stability in next-generation proteomics
[J].
Goh, Wilson Wen Bin
;
Wong, Limsoon
.
JOURNAL OF BIOINFORMATICS AND COMPUTATIONAL BIOLOGY,
2016, 14 (05)

论文数: 引用数:
h-index:
机构:

Wong, Limsoon
论文数: 0 引用数: 0
h-index: 0
机构: Tianjin Univ, Sch Pharmaceut Sci & Technol, 92 Weijin Rd, Tianjin 300072, Peoples R China
[6]
Molecular classification of cancer: Class discovery and class prediction by gene expression monitoring
[J].
Golub, TR
;
Slonim, DK
;
Tamayo, P
;
Huard, C
;
Gaasenbeek, M
;
Mesirov, JP
;
Coller, H
;
Loh, ML
;
Downing, JR
;
Caligiuri, MA
;
Bloomfield, CD
;
Lander, ES
.
SCIENCE,
1999, 286 (5439)
:531-537

Golub, TR
论文数: 0 引用数: 0
h-index: 0
机构: MIT, Whitehead Inst, Ctr Genome Res, Cambridge, MA 02139 USA

Slonim, DK
论文数: 0 引用数: 0
h-index: 0
机构: MIT, Whitehead Inst, Ctr Genome Res, Cambridge, MA 02139 USA

Tamayo, P
论文数: 0 引用数: 0
h-index: 0
机构: MIT, Whitehead Inst, Ctr Genome Res, Cambridge, MA 02139 USA

Huard, C
论文数: 0 引用数: 0
h-index: 0
机构: MIT, Whitehead Inst, Ctr Genome Res, Cambridge, MA 02139 USA

Gaasenbeek, M
论文数: 0 引用数: 0
h-index: 0
机构: MIT, Whitehead Inst, Ctr Genome Res, Cambridge, MA 02139 USA

Mesirov, JP
论文数: 0 引用数: 0
h-index: 0
机构: MIT, Whitehead Inst, Ctr Genome Res, Cambridge, MA 02139 USA

Coller, H
论文数: 0 引用数: 0
h-index: 0
机构: MIT, Whitehead Inst, Ctr Genome Res, Cambridge, MA 02139 USA

Loh, ML
论文数: 0 引用数: 0
h-index: 0
机构: MIT, Whitehead Inst, Ctr Genome Res, Cambridge, MA 02139 USA

Downing, JR
论文数: 0 引用数: 0
h-index: 0
机构: MIT, Whitehead Inst, Ctr Genome Res, Cambridge, MA 02139 USA

Caligiuri, MA
论文数: 0 引用数: 0
h-index: 0
机构: MIT, Whitehead Inst, Ctr Genome Res, Cambridge, MA 02139 USA

Bloomfield, CD
论文数: 0 引用数: 0
h-index: 0
机构: MIT, Whitehead Inst, Ctr Genome Res, Cambridge, MA 02139 USA

Lander, ES
论文数: 0 引用数: 0
h-index: 0
机构: MIT, Whitehead Inst, Ctr Genome Res, Cambridge, MA 02139 USA
[7]
Gene expression comparison of biopsies from Duchenne muscular dystrophy (DMD) and normal skeletal muscle
[J].
Haslett, JN
;
Sanoudou, D
;
Kho, AT
;
Bennett, RR
;
Greenberg, SA
;
Kohane, IS
;
Beggs, AH
;
Kunkel, LM
.
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA,
2002, 99 (23)
:15000-15005

Haslett, JN
论文数: 0 引用数: 0
h-index: 0
机构: Harvard Univ, Sch Med, Dept Genet, Boston, MA 02115 USA

Sanoudou, D
论文数: 0 引用数: 0
h-index: 0
机构: Harvard Univ, Sch Med, Dept Genet, Boston, MA 02115 USA

Kho, AT
论文数: 0 引用数: 0
h-index: 0
机构: Harvard Univ, Sch Med, Dept Genet, Boston, MA 02115 USA

Bennett, RR
论文数: 0 引用数: 0
h-index: 0
机构: Harvard Univ, Sch Med, Dept Genet, Boston, MA 02115 USA

Greenberg, SA
论文数: 0 引用数: 0
h-index: 0
机构: Harvard Univ, Sch Med, Dept Genet, Boston, MA 02115 USA

Kohane, IS
论文数: 0 引用数: 0
h-index: 0
机构: Harvard Univ, Sch Med, Dept Genet, Boston, MA 02115 USA

Beggs, AH
论文数: 0 引用数: 0
h-index: 0
机构: Harvard Univ, Sch Med, Dept Genet, Boston, MA 02115 USA

Kunkel, LM
论文数: 0 引用数: 0
h-index: 0
机构:
Harvard Univ, Sch Med, Dept Genet, Boston, MA 02115 USA Harvard Univ, Sch Med, Dept Genet, Boston, MA 02115 USA
[8]
Tackling the widespread and critical impact of batch effects in high-throughput data
[J].
Leek, Jeffrey T.
;
Scharpf, Robert B.
;
Bravo, Hector Corrada
;
Simcha, David
;
Langmead, Benjamin
;
Johnson, W. Evan
;
Geman, Donald
;
Baggerly, Keith
;
Irizarry, Rafael A.
.
NATURE REVIEWS GENETICS,
2010, 11 (10)
:733-739

Leek, Jeffrey T.
论文数: 0 引用数: 0
h-index: 0
机构:
Johns Hopkins Bloomberg Sch Publ Hlth, Dept Biostat, Baltimore, MD 21205 USA Johns Hopkins Bloomberg Sch Publ Hlth, Dept Biostat, Baltimore, MD 21205 USA

Scharpf, Robert B.
论文数: 0 引用数: 0
h-index: 0
机构:
Johns Hopkins Univ, Dept Oncol, Baltimore, MD 21205 USA Johns Hopkins Bloomberg Sch Publ Hlth, Dept Biostat, Baltimore, MD 21205 USA

Bravo, Hector Corrada
论文数: 0 引用数: 0
h-index: 0
机构:
Johns Hopkins Bloomberg Sch Publ Hlth, Dept Biostat, Baltimore, MD 21205 USA
Univ Maryland, Dept Comp Sci, College Pk, MD 20742 USA Johns Hopkins Bloomberg Sch Publ Hlth, Dept Biostat, Baltimore, MD 21205 USA

Simcha, David
论文数: 0 引用数: 0
h-index: 0
机构:
Johns Hopkins Univ, Dept Biomed Engn, Baltimore, MD USA Johns Hopkins Bloomberg Sch Publ Hlth, Dept Biostat, Baltimore, MD 21205 USA

Langmead, Benjamin
论文数: 0 引用数: 0
h-index: 0
机构:
Johns Hopkins Bloomberg Sch Publ Hlth, Dept Biostat, Baltimore, MD 21205 USA Johns Hopkins Bloomberg Sch Publ Hlth, Dept Biostat, Baltimore, MD 21205 USA

Johnson, W. Evan
论文数: 0 引用数: 0
h-index: 0
机构:
Brigham Young Univ, Dept Stat, Provo, UT 84602 USA Johns Hopkins Bloomberg Sch Publ Hlth, Dept Biostat, Baltimore, MD 21205 USA

Geman, Donald
论文数: 0 引用数: 0
h-index: 0
机构:
Johns Hopkins Univ, Dept Appl Math & Stat, Baltimore, MD 21218 USA Johns Hopkins Bloomberg Sch Publ Hlth, Dept Biostat, Baltimore, MD 21205 USA

Baggerly, Keith
论文数: 0 引用数: 0
h-index: 0
机构:
Univ Texas MD Anderson Canc Ctr, Dept Bioinformat & Computat Biol, Houston, TX 77230 USA Johns Hopkins Bloomberg Sch Publ Hlth, Dept Biostat, Baltimore, MD 21205 USA

Irizarry, Rafael A.
论文数: 0 引用数: 0
h-index: 0
机构:
Johns Hopkins Bloomberg Sch Publ Hlth, Dept Biostat, Baltimore, MD 21205 USA Johns Hopkins Bloomberg Sch Publ Hlth, Dept Biostat, Baltimore, MD 21205 USA
[9]
Finding consistent disease subnetworks using PFSNet
[J].
Lim, Kevin
;
Wong, Limsoon
.
BIOINFORMATICS,
2014, 30 (02)
:189-196

Lim, Kevin
论文数: 0 引用数: 0
h-index: 0
机构:
Natl Univ Singapore, Sch Comp, Singapore 117417, Singapore Natl Univ Singapore, Sch Comp, Singapore 117417, Singapore

Wong, Limsoon
论文数: 0 引用数: 0
h-index: 0
机构:
Natl Univ Singapore, Sch Comp, Singapore 117417, Singapore Natl Univ Singapore, Sch Comp, Singapore 117417, Singapore
[10]
A comparison of batch effect removal methods for enhancement of prediction performance using MAQC-II microarray gene expression data
[J].
Luo, J.
;
Schumacher, M.
;
Scherer, A.
;
Sanoudou, D.
;
Megherbi, D.
;
Davison, T.
;
Shi, T.
;
Tong, W.
;
Shi, L.
;
Hong, H.
;
Zhao, C.
;
Elloumi, F.
;
Shi, W.
;
Thomas, R.
;
Lin, S.
;
Tillinghast, G.
;
Liu, G.
;
Zhou, Y.
;
Herman, D.
;
Li, Y.
;
Deng, Y.
;
Fang, H.
;
Bushel, P.
;
Woods, M.
;
Zhang, J.
.
PHARMACOGENOMICS JOURNAL,
2010, 10 (04)
:278-291

Luo, J.
论文数: 0 引用数: 0
h-index: 0
机构: Syst Analyt Inc, Dept Bioinformat, Waltham, MA 02453 USA

Schumacher, M.
论文数: 0 引用数: 0
h-index: 0
机构:
Novartis Pharma AG, NIBR, Biomarker Dev Dept, Basel, Switzerland Syst Analyt Inc, Dept Bioinformat, Waltham, MA 02453 USA

Scherer, A.
论文数: 0 引用数: 0
h-index: 0
机构:
Spheromics, Kontiolahti, Finland Syst Analyt Inc, Dept Bioinformat, Waltham, MA 02453 USA

Sanoudou, D.
论文数: 0 引用数: 0
h-index: 0
机构:
Acad Athens, Biomed Res Fdn, Dept Mol Biol, Athens, Greece
Natl & Kapodistrian Univ Athens, Sch Med, Dept Pharmacol, Athens 11528, Greece Syst Analyt Inc, Dept Bioinformat, Waltham, MA 02453 USA

Megherbi, D.
论文数: 0 引用数: 0
h-index: 0
机构:
Univ Massachusetts, Dept Elect & Comp Engn, CMINDS Res Ctr, Lowell, MA USA Syst Analyt Inc, Dept Bioinformat, Waltham, MA 02453 USA

Davison, T.
论文数: 0 引用数: 0
h-index: 0
机构:
Almac Diagnost, Craigavon, North Ireland Syst Analyt Inc, Dept Bioinformat, Waltham, MA 02453 USA

Shi, T.
论文数: 0 引用数: 0
h-index: 0
机构:
Chinese Acad Sci, Shanghai Informat Ctr Life Sci, Shanghai, Peoples R China Syst Analyt Inc, Dept Bioinformat, Waltham, MA 02453 USA

Tong, W.
论文数: 0 引用数: 0
h-index: 0
机构:
US FDA, Div Syst Biol, Natl Ctr Toxicol Res, Jefferson, AR USA Syst Analyt Inc, Dept Bioinformat, Waltham, MA 02453 USA

Shi, L.
论文数: 0 引用数: 0
h-index: 0
机构:
US FDA, Div Syst Biol, Natl Ctr Toxicol Res, Jefferson, AR USA Syst Analyt Inc, Dept Bioinformat, Waltham, MA 02453 USA

Hong, H.
论文数: 0 引用数: 0
h-index: 0
机构:
US FDA, Div Syst Biol, Natl Ctr Toxicol Res, Jefferson, AR USA Syst Analyt Inc, Dept Bioinformat, Waltham, MA 02453 USA

Zhao, C.
论文数: 0 引用数: 0
h-index: 0
机构:
NE Forestry Univ, Coll Life Sci, Harbin, Heilongjiang, Peoples R China Syst Analyt Inc, Dept Bioinformat, Waltham, MA 02453 USA

Elloumi, F.
论文数: 0 引用数: 0
h-index: 0
机构:
Univ N Carolina, Lineberger Comprehens Canc Ctr, Chapel Hill, NC 27599 USA Syst Analyt Inc, Dept Bioinformat, Waltham, MA 02453 USA

Shi, W.
论文数: 0 引用数: 0
h-index: 0
机构:
GeneGo Inc, St Joseph, MI USA Syst Analyt Inc, Dept Bioinformat, Waltham, MA 02453 USA

Thomas, R.
论文数: 0 引用数: 0
h-index: 0
机构:
Hamner Inst Hlth Sci, Res Triangle Pk, NC USA Syst Analyt Inc, Dept Bioinformat, Waltham, MA 02453 USA

Lin, S.
论文数: 0 引用数: 0
h-index: 0
机构:
Northwestern Univ, Clin & Translat Sci Inst, Chicago, IL 60611 USA Syst Analyt Inc, Dept Bioinformat, Waltham, MA 02453 USA

Tillinghast, G.
论文数: 0 引用数: 0
h-index: 0
机构:
Riverside Canc Care Ctr, Dept Clin Res, Newport News, VA USA Syst Analyt Inc, Dept Bioinformat, Waltham, MA 02453 USA

Liu, G.
论文数: 0 引用数: 0
h-index: 0
机构:
SABiosci Corp, R&D Div, Frederick, MD USA Syst Analyt Inc, Dept Bioinformat, Waltham, MA 02453 USA

Zhou, Y.
论文数: 0 引用数: 0
h-index: 0
机构:
Univ Arkansas Med Sci, Myeloma Inst Res & Therapy, Little Rock, AR 72205 USA Syst Analyt Inc, Dept Bioinformat, Waltham, MA 02453 USA

Herman, D.
论文数: 0 引用数: 0
h-index: 0
机构:
Univ Arkansas Med Sci, Myeloma Inst Res & Therapy, Little Rock, AR 72205 USA Syst Analyt Inc, Dept Bioinformat, Waltham, MA 02453 USA

Li, Y.
论文数: 0 引用数: 0
h-index: 0
机构:
Univ Illinois, Dept Bioengn, Urbana, IL 61801 USA Syst Analyt Inc, Dept Bioinformat, Waltham, MA 02453 USA

Deng, Y.
论文数: 0 引用数: 0
h-index: 0
机构:
Univ So Mississippi, Dept Biol Sci, Hattiesburg, MS 39406 USA Syst Analyt Inc, Dept Bioinformat, Waltham, MA 02453 USA

Fang, H.
论文数: 0 引用数: 0
h-index: 0
机构:
US FDA, Z Tech, Natl Ctr Toxicol Res, Jefferson, AR USA Syst Analyt Inc, Dept Bioinformat, Waltham, MA 02453 USA

Bushel, P.
论文数: 0 引用数: 0
h-index: 0
机构:
NIEHS, Biostat Branch, Res Triangle Pk, NC 27709 USA Syst Analyt Inc, Dept Bioinformat, Waltham, MA 02453 USA

Woods, M.
论文数: 0 引用数: 0
h-index: 0
机构: Syst Analyt Inc, Dept Bioinformat, Waltham, MA 02453 USA

Zhang, J.
论文数: 0 引用数: 0
h-index: 0
机构:
Syst Analyt Inc, Dept Bioinformat, Waltham, MA 02453 USA Syst Analyt Inc, Dept Bioinformat, Waltham, MA 02453 USA