Unsupervised Feature Selection Using an Integrated Strategy of Hierarchical Clustering With Singular Value Decomposition: An Integrative Biomarker Discovery Method With Application to Acute Myeloid Leukemia

被引:6
作者
Bhadra, Tapas [1 ]
Mallik, Saurav [2 ]
Sohel, Amir [1 ]
Zhao, Zhongming [3 ,4 ]
机构
[1] Aliah Univ, Dept Comp Sci & Engn, Kolkata 700160, India
[2] Univ Texas Hlth Sci Ctr Houston, Ctr Precis Hlth, Sch Biomed Informat, Houston, TX 77030 USA
[3] Univ Texas Hlth Sci Ctr Houston, Ctr Precis Hlth, Sch Biomed Informatices, Houston, TX 77030 USA
[4] Univ Texas Hlth Sci Ctr Houston, Human Genet Ctr, Sch Publ Hlth, MD Anderson Canc Ctr,UTHlth Grad Sch Biomed Sci, Houston, TX 77030 USA
关键词
Feature extraction; Mutual information; Entropy; Mathematical model; Task analysis; Laplace equations; Clustering algorithms; Pattern recognition; unsupervised feature selection; mutual information; normalized mutual information; MUTUAL INFORMATION; EXPRESSION; RELEVANCE;
D O I
10.1109/TCBB.2021.3110989
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
In this article, we propose a novel unsupervised feature selection method by combining hierarchical feature clustering with singular value decomposition (SVD). The proposed algorithm first generates several feature clusters by adopting the hierarchical clustering on the feature space and then applies SVD to each of these feature clusters to find out the feature that contributes most to the SVD-entropy. The proposed feature selection method selects an optimal feature subset that not only minimizes the mutual dependency among the selected features but also maximizes the mutual dependency of the selected features against their nearest neighbor non-selected features to some extent. Each of the selected features also contributes the maximum SVD-entropy among all features of the same feature cluster. The experimental results demonstrate that the proposed algorithm performs well against several state-of-the-art methods of feature selection in terms of various evaluation criteria such as classification accuracy, redundancy rate, and representation entropy. The superiority of the proposed algorithm is demonstrated through analysis of Acute Myeloid Leukemia (AML) multi-omics data that consist of five datasets: gene expression, exon expression, methylation, microRNA, and pathway activity dataset (paradigm IPLs) from The Cancer Genome Atlas (TCGA). Our analysis pinpoints a candidate gene-marker, EREG for AML with an integrative omics evidence. EREG is targeted by two top ranked microRNAs, hsa-miR-1286 and hsa-miR-1976, here in the datasets. The method and results will be useful for biomarker discovery in the era of in precision medicine.
引用
收藏
页码:1354 / 1364
页数:11
相关论文
共 37 条
[1]  
[Anonymous], 2005, P 18 INT C NEUR INF
[2]  
[Anonymous], 2010, P 16 ACM SIGKDD INT
[3]  
[Anonymous], COHORT TCGA ACUTE MY
[4]   Integrating Multiple Data Sources for Combinatorial Marker Discovery: A Study in Tumorigenesis [J].
Bandyopadhyay, Sanghamitra ;
Mallik, Saurav .
IEEE-ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS, 2018, 15 (02) :673-687
[5]  
Bandyopadhyay S, 2015, J MULT-VALUED LOG S, V25, P189
[6]   A Survey and Comparative Study of Statistical Tests for Identifying Differential Expression from Microarray Data [J].
Bandyopadhyay, Sanghamitra ;
Mallik, Saurav ;
Mukhopadhyay, Anirban .
IEEE-ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS, 2014, 11 (01) :95-115
[7]   Integration of dense subgraph finding with feature clustering for unsupervised feature selection [J].
Bandyopadhyay, Sanghamitra ;
Bhadra, Tapas ;
Mitra, Pabitra ;
Maulik, Ujjwal .
PATTERN RECOGNITION LETTERS, 2014, 40 :104-112
[8]   Feature selection with SVD entropy: Some modification and extension [J].
Banerjee, Monami ;
Pal, Nikhil R. .
INFORMATION SCIENCES, 2014, 264 :118-134
[9]   USING MUTUAL INFORMATION FOR SELECTING FEATURES IN SUPERVISED NEURAL-NET LEARNING [J].
BATTITI, R .
IEEE TRANSACTIONS ON NEURAL NETWORKS, 1994, 5 (04) :537-550
[10]   Supervised feature selection using integration of densest subgraph finding with floating forward-backward search [J].
Bhadra, Tapas ;
Bandyopadhyay, Sanghamitra .
INFORMATION SCIENCES, 2021, 566 :1-18