Data-driven characterization of molecular phenotypes across heterogeneous sample collections

被引:18
作者
Mehtonen, Juha [1 ]
Polonen, Petri [1 ]
Hayrynen, Sergei [2 ]
Dufva, Olli [3 ,4 ]
Lin, Jake [2 ]
Liuksiala, Thomas [2 ,5 ,6 ]
Granberg, Kirsi [2 ]
Lohi, Olli [5 ,6 ]
Hautamaki, Ville [7 ]
Nykter, Matti [2 ]
Heinaniemi, Merja [1 ]
机构
[1] Univ Eastern Finland, Sch Med, Inst Biomed, Kuopio, Finland
[2] Tampere Univ, Fac Med & Hlth Technol, Tampere, Finland
[3] Univ Helsinki, Hematol Res Unit Helsinki, Helsinki, Finland
[4] Helsinki Univ Hosp, Comprehens Canc Ctr, Dept Hematol, Helsinki, Finland
[5] Tampere Univ, Tampere Ctr Child Hlth Res, Tampere, Finland
[6] Tampere Univ Hosp, Tampere, Finland
[7] Univ Eastern Finland, Sch Comp, Joensuu, Finland
基金
芬兰科学院;
关键词
GENE-EXPRESSION; CLASSIFICATION; LEUKEMIA;
D O I
10.1093/nar/gkz281
中图分类号
Q5 [生物化学]; Q7 [分子生物学];
学科分类号
071010 ; 081704 ;
摘要
Existing large gene expression data repositories hold enormous potential to elucidate disease mechanisms, characterize changes in cellular pathways, and to stratify patients based on molecular profiles. To achieve this goal, integrative resources and tools are needed that allow comparison of results across datasets and data types. We propose an intuitive approach for data-driven stratifications of molecular profiles and benchmark our methodology using the dimensionality reduction algorithm t-distributed stochastic neighbor embedding (t-SNE) with multi-study and multi-platform data on hematological malignancies. Our approach enables assessing the contribution of biological versus technical variation to sample clustering, direct incorporation of additional datasets to the same low dimensional representation, comparison of molecular disease subtypes identified from separate t-SNE representations, and characterization of the obtained clusters based on pathway databases and additional data. In this manner, we performed an integrative analysis across multi-omics acute myeloid leukemia studies. Our approach indicated new molecular subtypes with differential survival and drug responsiveness among samples lacking fusion genes, including a novel myelodysplastic syndrome-like cluster and a cluster characterized with CEBPA mutations and differential activity of the S-adenosylmethionine-dependent DNA methylation pathway. In summary, integration across multiple studies can help to identify novel molecular disease subtypes and generate insight into disease biology.
引用
收藏
页数:12
相关论文
共 44 条
[1]  
[Anonymous], ENCY GENETICS GENOMI
[2]  
[Anonymous], 2012, Technical Report No. 597
[3]   Multi-Omics Factor Analysis-a framework for unsupervised integration of multi-omics data sets [J].
Argelaguet, Ricard ;
Velten, Britta ;
Arnol, Damien ;
Dietrich, Sascha ;
Zenz, Thorsten ;
Marioni, John C. ;
Buettner, Florian ;
Huber, Wolfgang ;
Stegle, Oliver .
MOLECULAR SYSTEMS BIOLOGY, 2018, 14 (06)
[4]  
Beyer K, 1999, LECT NOTES COMPUT SC, V1540, P217
[5]   Mapping gene expression quantitative trait loci by singular value decomposition and independent component analysis [J].
Biswas, Shameek ;
Storey, John D. ;
Akey, Joshua M. .
BMC BIOINFORMATICS, 2008, 9 (1)
[6]   Pathway Commons, a web resource for biological pathway data [J].
Cerami, Ethan G. ;
Gross, Benjamin E. ;
Demir, Emek ;
Rodchenkov, Igor ;
Babur, Oezguen ;
Anwar, Nadia ;
Schultz, Nikolaus ;
Bader, Gary D. ;
Sander, Chris .
NUCLEIC ACIDS RESEARCH, 2011, 39 :D685-D690
[7]   MEAN SHIFT, MODE SEEKING, AND CLUSTERING [J].
CHENG, YZ .
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 1995, 17 (08) :790-799
[8]   Global reconstruction of the human metabolic network based on genomic and bibliomic data [J].
Duarte, Natalie C. ;
Becker, Scott A. ;
Jamshidi, Neema ;
Thiele, Ines ;
Mo, Monica L. ;
Vo, Thuy D. ;
Srivas, Rohith ;
Palsson, Bernhard O. .
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 2007, 104 (06) :1777-1782
[9]   An integrated encyclopedia of DNA elements in the human genome [J].
Dunham, Ian ;
Kundaje, Anshul ;
Aldred, Shelley F. ;
Collins, Patrick J. ;
Davis, CarrieA. ;
Doyle, Francis ;
Epstein, Charles B. ;
Frietze, Seth ;
Harrow, Jennifer ;
Kaul, Rajinder ;
Khatun, Jainab ;
Lajoie, Bryan R. ;
Landt, Stephen G. ;
Lee, Bum-Kyu ;
Pauli, Florencia ;
Rosenbloom, Kate R. ;
Sabo, Peter ;
Safi, Alexias ;
Sanyal, Amartya ;
Shoresh, Noam ;
Simon, Jeremy M. ;
Song, Lingyun ;
Trinklein, Nathan D. ;
Altshuler, Robert C. ;
Birney, Ewan ;
Brown, James B. ;
Cheng, Chao ;
Djebali, Sarah ;
Dong, Xianjun ;
Dunham, Ian ;
Ernst, Jason ;
Furey, Terrence S. ;
Gerstein, Mark ;
Giardine, Belinda ;
Greven, Melissa ;
Hardison, Ross C. ;
Harris, Robert S. ;
Herrero, Javier ;
Hoffman, Michael M. ;
Iyer, Sowmya ;
Kellis, Manolis ;
Khatun, Jainab ;
Kheradpour, Pouya ;
Kundaje, Anshul ;
Lassmann, Timo ;
Li, Qunhua ;
Lin, Xinying ;
Marinov, Georgi K. ;
Merkel, Angelika ;
Mortazavi, Ali .
NATURE, 2012, 489 (7414) :57-74
[10]   Gene Expression Omnibus: NCBI gene expression and hybridization array data repository [J].
Edgar, R ;
Domrachev, M ;
Lash, AE .
NUCLEIC ACIDS RESEARCH, 2002, 30 (01) :207-210