Block-Constraint Robust Principal Component Analysis and its Application to Integrated Analysis of TCGA Data

被引:16
作者
Liu, Jin-Xing [1 ]
Gao, Ying-Lian [2 ]
Zheng, Chun-Hou [1 ]
Xu, Yong [3 ,4 ]
Yu, Jiguo [1 ]
机构
[1] Qufu Normal Univ, Sch Informat Sci & Engn, Rizhao 276826, Peoples R China
[2] Qufu Normal Univ, Lib Qufu Normal Univ, Rizhao 276826, Peoples R China
[3] Harbin Inst Technol, Shenzhen Grad Sch, Biocomp Res Ctr, Shenzhen 518055, Peoples R China
[4] Key Lab Network Oriented Intelligent Computat, Shenzhen 518055, Peoples R China
关键词
Constrained optimization; feature evaluating and selection; feature extraction or construction; mining methods and algorithms; PENALIZED MATRIX DECOMPOSITION; RNA-SEQ; CANCER; IDENTIFICATION; DISCOVERY;
D O I
10.1109/TNB.2016.2574923
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
The Cancer Genome Atlas (TCGA) dataset provides us more opportunities to systematically and comprehensively learn some biological mechanism of cancers formation, growth and metastasis. Since TCGA dataset includes heterogeneous data, it is one of the bioinformatics bottlenecks to mine some meaningful information from them. In this paper, to improve the performance of Robust Principal Component Analysis (RPCA) analyzing these heterogeneous data, a modified RPCA-based method, Block-Constraint Robust Principal Component Analysis (BCRPCA), is proposed. Since different categories data have different peculiarities, BCRPCA enforces different constraint intensities on different categories to improve the performance of RPCA. Firstly, the observation matrix of TCGA data is decomposed into two adding matrices A and S by using BCRPCA. Secondly, we use a ranking scheme to evaluate every feature and project these features to the genes. Then, the genes with high scores will be identified as differentially expressed ones. The main contributions of this paper are as following: firstly, it proposes, for the first time, the idea and method of BCRPCA to model TCGA data; secondly, it provides a BCRPCA-based framework for integrated analysis of TCGA data. The results show that our method is effective and suitable to analyze these data.
引用
收藏
页码:510 / 516
页数:7
相关论文
共 27 条
[1]  
[Anonymous], 2010, ABS10095055 CORR
[2]  
Atlas T. C. G., 2015, CANC GENOME ATLAS TC
[3]   A SINGULAR VALUE THRESHOLDING ALGORITHM FOR MATRIX COMPLETION [J].
Cai, Jian-Feng ;
Candes, Emmanuel J. ;
Shen, Zuowei .
SIAM JOURNAL ON OPTIMIZATION, 2010, 20 (04) :1956-1982
[4]   Robust Principal Component Analysis? [J].
Candes, Emmanuel J. ;
Li, Xiaodong ;
Ma, Yi ;
Wright, John .
JOURNAL OF THE ACM, 2011, 58 (03)
[5]  
Chang Y. H., 2014, BIORXIV
[6]   The genomic and transcriptomic architecture of 2,000 breast tumours reveals novel subgroups [J].
Curtis, Christina ;
Shah, Sohrab P. ;
Chin, Suet-Feung ;
Turashvili, Gulisa ;
Rueda, Oscar M. ;
Dunning, Mark J. ;
Speed, Doug ;
Lynch, Andy G. ;
Samarajiwa, Shamith ;
Yuan, Yinyin ;
Graef, Stefan ;
Ha, Gavin ;
Haffari, Gholamreza ;
Bashashati, Ali ;
Russell, Roslin ;
McKinney, Steven ;
Langerod, Anita ;
Green, Andrew ;
Provenzano, Elena ;
Wishart, Gordon ;
Pinder, Sarah ;
Watson, Peter ;
Markowetz, Florian ;
Murphy, Leigh ;
Ellis, Ian ;
Purushotham, Arnie ;
Borresen-Dale, Anne-Lise ;
Brenton, James D. ;
Tavare, Simon ;
Caldas, Carlos ;
Aparicio, Samuel .
NATURE, 2012, 486 (7403) :346-352
[7]   microRNA Regulatory Network Inference Identifies miR-34a as a Novel Regulator of TGF-β Signaling in Glioblastoma [J].
Genovese, Giannicola ;
Ergun, Ayla ;
Shukla, Sachet A. ;
Campos, Benito ;
Hanna, Jason ;
Ghosh, Papia ;
Quayle, Steven N. ;
Rai, Kunal ;
Colla, Simona ;
Ying, Haoqiang ;
Wu, Chang-Jiun ;
Sarkar, Sharmistha ;
Xiao, Yonghong ;
Zhang, Jianhua ;
Zhang, Hailei ;
Kwong, Lawrence ;
Dunn, Katherine ;
Wiedemeyer, Wolf Ruprecht ;
Brennan, Cameron ;
Zheng, Hongwu ;
Rimm, David L. ;
Collins, James J. ;
Chin, Lynda .
CANCER DISCOVERY, 2012, 2 (08) :736-749
[8]   Comprehensive genomic characterization of squamous cell lung cancers [J].
Hammerman, Peter S. ;
Lawrence, Michael S. ;
Voet, Douglas ;
Jing, Rui ;
Cibulskis, Kristian ;
Sivachenko, Andrey ;
Stojanov, Petar ;
McKenna, Aaron ;
Lander, Eric S. ;
Gabriel, Stacey ;
Getz, Gad ;
Sougnez, Carrie ;
Imielinski, Marcin ;
Helman, Elena ;
Hernandez, Bryan ;
Pho, Nam H. ;
Meyerson, Matthew ;
Chu, Andy ;
Chun, Hye-Jung E. ;
Mungall, Andrew J. ;
Pleasance, Erin ;
Robertson, A. Gordon ;
Sipahimalani, Payal ;
Stoll, Dominik ;
Balasundaram, Miruna ;
Birol, Inanc ;
Butterfield, Yaron S. N. ;
Chuah, Eric ;
Coope, Robin J. N. ;
Corbett, Richard ;
Dhalla, Noreen ;
Guin, Ranabir ;
Hirst, Anhe Carrie ;
Hirst, Martin ;
Holt, Robert A. ;
Lee, Darlene ;
Li, Haiyan I. ;
Mayo, Michael ;
Moore, Richard A. ;
Mungall, Karen ;
Nip, Ka Ming ;
Olshen, Adam ;
Schein, Jacqueline E. ;
Slobodan, Jared R. ;
Tam, Angela ;
Thiessen, Nina ;
Varhol, Richard ;
Zeng, Thomas ;
Zhao, Yongjun ;
Jones, Steven J. M. .
NATURE, 2012, 489 (7417) :519-525
[9]   DNA microarray technology: Devices, systems, and applications [J].
Heller, MJ .
ANNUAL REVIEW OF BIOMEDICAL ENGINEERING, 2002, 4 :129-153
[10]   Systematic and integrative analysis of large gene lists using DAVID bioinformatics resources [J].
Huang, Da Wei ;
Sherman, Brad T. ;
Lempicki, Richard A. .
NATURE PROTOCOLS, 2009, 4 (01) :44-57