A robust fuzzy rule based integrative feature selection strategy for gene expression data in TCGA

被引:5
作者
Fan, Shicai [1 ,2 ,3 ]
Tang, Jianxiong [1 ]
Tian, Qi [1 ]
Wu, Chunguo [3 ]
机构
[1] Univ Elect Sci & Technol China, Sch Automat Engn, Chengdu 611731, Sichuan, Peoples R China
[2] Univ Elect Sci & Technol China, Ctr Informat Biol, Chengdu 611731, Sichuan, Peoples R China
[3] Jilin Univ, Minist Educ, Key Lab Symbol Computat & Knowledge Engn, Changchun 130012, Jilin, Peoples R China
基金
中国国家自然科学基金;
关键词
Integrative strategy; Expanded methylation data; Biomarker based feature selection; Robustness; Fuzzy rule; TCGA data; CANCER PROGNOSIS; CLASSIFICATION; DIAGNOSIS; METHYLATION; PREDICTION; BIOMARKERS;
D O I
10.1186/s12920-018-0451-x
中图分类号
Q3 [遗传学];
学科分类号
071007 ; 090102 ;
摘要
BackgroundLots of researches have been conducted in the selection of gene signatures that could distinguish the cancer patients from the normal. However, it is still an open question on how to extract the robust gene features.MethodsIn this work, a gene signature selection strategy for TCGA data was proposed by integrating the gene expression data, the methylation data and the prior knowledge about cancer biomarkers. Different from the traditional integration method, the expanded 450K methylation data were applied instead of the original 450K array data, and the reported biomarkers were weighted in the feature selection. Fuzzy rule based classification method and cross validation strategy were applied in the model construction for performance evaluation.ResultsOur selected gene features showed prediction accuracy close to 100% in the cross validation with fuzzy rule based classification model on 6 cancers from TCGA. The cross validation performance of our proposed model is similar to other integrative models or RNA-seq only model, while the prediction performance on independent data is obviously better than other 5 models. The gene signatures extracted with our fuzzy rule based integrative feature selection strategy were more robust, and had the potential to get better prediction results.ConclusionThe results indicated that the integration of expanded methylation data would cover more genes, and had greater capacity to retrieve the signature genes compared with the original 450K methylation data. Also, the integration of the reported biomarkers was a promising way to improve the performance. PTCHD3 gene was selected as a discriminating gene in 3 out of the 6 cancers, which suggested that it might play important role in the cancer risk and would be worthy for the intensive investigation.
引用
收藏
页数:9
相关论文
共 34 条
  • [1] Robust biomarker identification for cancer diagnosis with ensemble feature selection methods
    Abeel, Thomas
    Helleputte, Thibault
    Van de Peer, Yves
    Dupont, Pierre
    Saeys, Yvan
    [J]. BIOINFORMATICS, 2010, 26 (03) : 392 - 398
  • [2] Early detection of lung cancer: role of biomarkers
    Brambilla, C
    Fievet, F
    Jeanmart, M
    de Fraipont, F
    Lantuejoul, S
    Frappat, V
    Ferretti, G
    Brichon, PY
    Moro-Sibilot, D
    [J]. EUROPEAN RESPIRATORY JOURNAL, 2003, 21 : 36S - 44S
  • [3] CAMUR: Knowledge extraction from RNA-seq cancer data through equivalent classification rules
    Cestarelli, Valerio
    Fiscon, Giulia
    Felici, Giovanni
    Bertolazzi, Paola
    Weitschek, Emanuel
    [J]. BIOINFORMATICS, 2016, 32 (05) : 697 - 704
  • [4] Assessing Genome-Wide Statistical Significance for Large p Small n Problems
    Diao, Guoqing
    Vidyashankar, Anand N.
    [J]. GENETICS, 2013, 194 (03): : 781 - +
  • [5] Male germ cell-specific expression of a novel Patched-domain containing gene Ptchd3
    Fan, Jun
    Akabane, Hiroto
    Zheng, Xuehai
    Zhou, Xuan
    Zhang, Li
    Liu, Qlang
    Zhang, Yong-Lian
    Yang, Jing
    Zhu, Guo-Zhang
    [J]. BIOCHEMICAL AND BIOPHYSICAL RESEARCH COMMUNICATIONS, 2007, 363 (03) : 757 - 761
  • [6] Computationally expanding infinium HumanMethylation450 BeadChip array data to reveal distinct DNA methylation patterns of rheumatoid arthritis
    Fan, Shicai
    Li, Chengzhe
    Ai, Rizi
    Wang, Mengchi
    Firestein, Gary S.
    Wang, Wei
    [J]. BIOINFORMATICS, 2016, 32 (12) : 1773 - 1778
  • [7] McTwo: a two-step feature selection algorithm based on maximal information coefficient
    Ge, Ruiquan
    Zhou, Manli
    Luo, Youxi
    Meng, Qinghan
    Mai, Guoqin
    Ma, Dongli
    Wang, Guoqing
    Zhou, Fengfeng
    [J]. BMC BIOINFORMATICS, 2016, 17
  • [8] Molecular classification of cancer: Class discovery and class prediction by gene expression monitoring
    Golub, TR
    Slonim, DK
    Tamayo, P
    Huard, C
    Gaasenbeek, M
    Mesirov, JP
    Coller, H
    Loh, ML
    Downing, JR
    Caligiuri, MA
    Bloomfield, CD
    Lander, ES
    [J]. SCIENCE, 1999, 286 (5439) : 531 - 537
  • [9] Integrated analysis of promoter methylation and expression of telomere related genes in breast cancer
    Heng, Jianfu
    Zhang, Fan
    Guo, Xinwu
    Tang, Lili
    Peng, Limin
    Luo, Xipeng
    Xu, Xunxun
    Wang, Shouman
    Dai, Lizhong
    Wang, Jun
    [J]. ONCOTARGET, 2017, 8 (15) : 25442 - 25454
  • [10] Integrating multiple molecular sources into a clinical risk prediction signature by extracting complementary information
    Hieke, Stefanie
    Benner, Axel
    Schlenl, Richard F.
    Schumacher, Martin
    Bullinger, Lars
    Binder, Harald
    [J]. BMC BIOINFORMATICS, 2016, 17