SMRT: Randomized Data Transformation for Cancer Subtyping and Big Data Analysis

被引:5
作者
Nguyen, Hung [1 ]
Tran, Duc [1 ]
Tran, Bang [1 ]
Roy, Monikrishna [1 ]
Cassell, Adam [1 ]
Dascalu, Sergiu [1 ]
Draghici, Sorin [2 ]
Nguyen, Tin [1 ]
机构
[1] Univ Nevada, Dept Comp Sci & Engn, Reno, NV 89557 USA
[2] Wayne State Univ, Dept Comp Sci, Detroit, MI 48202 USA
基金
美国国家科学基金会;
关键词
cancer subtyping; multi-omics integration; web application; CRAN package; survival analysis; DISCOVERY; MODULES; GENE; SURVIVAL; TUMORS; JOINT;
D O I
10.3389/fonc.2021.725133
中图分类号
R73 [肿瘤学];
学科分类号
100214 ;
摘要
Cancer is an umbrella term that includes a range of disorders, from those that are fast-growing and lethal to indolent lesions with low or delayed potential for progression to death. The treatment options, as well as treatment success, are highly dependent on the correct subtyping of individual patients. With the advancement of high-throughput platforms, we have the opportunity to differentiate among cancer subtypes from a holistic perspective that takes into consideration phenomena at different molecular levels (mRNA, methylation, etc.). This demands powerful integrative methods to leverage large multi-omics datasets for a better subtyping. Here we introduce Subtyping Multi-omics using a Randomized Transformation (SMRT), a new method for multi-omics integration and cancer subtyping. SMRT offers the following advantages over existing approaches: (i) the scalable analysis pipeline allows researchers to integrate multi-omics data and analyze hundreds of thousands of samples in minutes, (ii) the ability to integrate data types with different numbers of patients, (iii) the ability to analyze un-matched data of different types, and (iv) the ability to offer users a convenient data analysis pipeline through a web application. We also improve the efficiency of our ensemble-based, perturbation clustering to support analysis on machines with memory constraints. In an extensive analysis, we compare SMRT with eight state-of-the-art subtyping methods using 37 TCGA and two METABRIC datasets comprising a total of almost 12,000 patient samples from 28 different types of cancer. We also performed a number of simulation studies. We demonstrate that SMRT outperforms other methods in identifying subtypes with significantly different survival profiles. In addition, SMRT is extremely fast, being able to analyze hundreds of thousands of samples in minutes. The web application is available at http://SMRT.tinnguyen-lab.com. The R package will be deposited to CRAN as part of our PINSPlus software suite.</p>
引用
收藏
页数:11
相关论文
共 50 条
[41]   Enhanced Cancer Subtyping via Pan-Transcriptomics Data Fusion, Monte-Carlo Consensus Clustering, and Auto Classifier Creation [J].
Linton-Reid, Kristofer ;
Clifford, Harry ;
Thompson, Joe Sneath .
PROCEEDINGS OF THE TENTH INTERNATIONAL CONFERENCE ON COMPUTATIONAL SYSTEMS-BIOLOGY AND BIOINFORMATICS (CSBIO 2019), 2019,
[42]   Should We Trust "Big Data" or "Meta-Data" to Guide Treatment for Patients with Resectable Esophageal Cancer? [J].
Yim, Guang H. ;
Pasquali, Sandro ;
Vohra, Ravinder ;
Griffiths, Ewen .
JOURNAL OF THE AMERICAN COLLEGE OF SURGEONS, 2017, 224 (05) :996-997
[43]   Integrating Multidimensional Data for Clustering Analysis With Applications to Cancer Patient Data [J].
Park, Seyoung ;
Xu, Hao ;
Zhao, Hongyu .
JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, 2021, 116 (533) :14-26
[44]   Subtype-GAN: a deep learning approach for integrative cancer subtyping of multi-omics data [J].
Yang, Hai ;
Chen, Rui ;
Li, Dongdong ;
Wang, Zhe .
BIOINFORMATICS, 2021, 37 (16) :2231-2237
[45]   Bayesian tensor factorization-drive breast cancer subtyping by integrating multi-omics data [J].
Liu, Qian ;
Cheng, Bowen ;
Jin, Yongwon ;
Hu, Pingzhao .
JOURNAL OF BIOMEDICAL INFORMATICS, 2022, 125
[46]   Clinical Characterization of Patients Diagnosed with Prostate Cancer and Undergoing Conservative Management: A PIONEER Analysis Based on Big Data [J].
Gandaglia, Giorgio ;
Pellegrino, Francesco ;
Golozar, Asieh ;
De Meulder, Bertrand ;
Abbott, Thomas ;
Achtman, Ariel ;
Omar, Muhammad Imran ;
Alshammari, Thamir ;
Areia, Carlos ;
Asiimwe, Alex ;
Beyer, Katharina ;
Bjartell, Anders ;
Campi, Riccardo ;
Cornford, Philip ;
Falconer, Thomas ;
Feng, Qi ;
Gong, Mengchun ;
Herrera, Ronald ;
Hughes, Nigel ;
Hulsen, Tim ;
Kinnaird, Adam ;
Lai, Lana Y. H. ;
Maresca, Gianluca ;
Mottet, Nicolas ;
Oja, Marek ;
Prinsen, Peter ;
Reich, Christian ;
Remmers, Sebastiaan ;
Roobol, Monique J. ;
Sakalis, Vasileios ;
Seager, Sarah ;
Smith, Emma J. ;
Snijder, Robert ;
Steinbeisser, Carl ;
Thurin, Nicolas H. ;
Hijazy, Ayman ;
van Bochove, Kees ;
van den Bergh, Roderick C. N. ;
Van Hemelrijck, Mieke ;
Willemse, Peter-Paul ;
Williams, Andrew E. ;
Kermani, Nazanin Zounemat ;
Evans-Axelsson, Susan ;
Briganti, Alberto ;
N'Dow, James .
EUROPEAN UROLOGY, 2024, 85 (05) :457-465
[47]   From big data analysis to personalized medicine for all: challenges and opportunities [J].
Alyass, Akram ;
Turcotte, Michelle ;
Meyre, David .
BMC MEDICAL GENOMICS, 2015, 8
[48]   New Development Strategy for Economic Platform Using Big Data Analysis [J].
Zhao, Wei .
MOBILE INFORMATION SYSTEMS, 2022, 2022
[49]   Critical analysis for big data studies in construction: significant gaps in knowledge [J].
Madanayake, Upeksha Hansini ;
Egbu, Charles .
BUILT ENVIRONMENT PROJECT AND ASSET MANAGEMENT, 2019, 9 (04) :530-547
[50]   D-TSVR Recurrence Prediction Driven by Medical Big Data in Cancer [J].
Yang, Ai-Min ;
Han, Yang ;
Liu, Chen-Shuai ;
Wu, Jian-Hui ;
Hua, Dian-Bo .
IEEE TRANSACTIONS ON INDUSTRIAL INFORMATICS, 2021, 17 (05) :3508-3517