ClearF: a supervised feature scoring method to find biomarkers using class-wise embedding and reconstruction
被引:3
作者:
Wang, Sehee
论文数: 0引用数: 0
h-index: 0
机构:
Ajou Univ, Dept Comp Engn, Suwon 16499, South KoreaAjou Univ, Dept Comp Engn, Suwon 16499, South Korea
Wang, Sehee
[1
]
Jeong, Hyun-Hwan
论文数: 0引用数: 0
h-index: 0
机构:
Texas Childrens Hosp, Jan & Dan Duncan Neurol Res Inst, Houston, TX 77030 USA
Baylor Coll Med, Dept Mol & Human Genet, Houston, TX 77030 USAAjou Univ, Dept Comp Engn, Suwon 16499, South Korea
Jeong, Hyun-Hwan
[2
,3
]
Sohn, Kyung-Ah
论文数: 0引用数: 0
h-index: 0
机构:
Ajou Univ, Dept Comp Engn, Suwon 16499, South KoreaAjou Univ, Dept Comp Engn, Suwon 16499, South Korea
Sohn, Kyung-Ah
[1
]
机构:
[1] Ajou Univ, Dept Comp Engn, Suwon 16499, South Korea
[2] Texas Childrens Hosp, Jan & Dan Duncan Neurol Res Inst, Houston, TX 77030 USA
[3] Baylor Coll Med, Dept Mol & Human Genet, Houston, TX 77030 USA
Feature selection;
Feature scoring;
Mutual information (MI);
Breast cancer;
Dimension reduction;
Low-dimensional embedding;
Reconstruction error;
Principal component analysis (PCA);
FEATURE-SELECTION;
GENE-EXPRESSION;
INFORMATION;
ACTIVATION;
SIGNATURES;
SUBTYPES;
D O I:
10.1186/s12920-019-0512-9
中图分类号:
Q3 [遗传学];
学科分类号:
071007 ;
090102 ;
摘要:
BackgroundFeature selection or scoring methods for the detection of biomarkers are essential in bioinformatics. Various feature selection methods have been developed for the detection of biomarkers, and several studies have employed information-theoretic approaches. However, most of these methods generally require a long processing time. In addition, information-theoretic methods discretize continuous features, which is a drawback that can lead to the loss of information.ResultsIn this paper, a novel supervised feature scoring method named ClearF is proposed. The proposed method is suitable for continuous-valued data, which is similar to the principle of feature selection using mutual information, with the added advantage of a reduced computation time. The proposed score calculation is motivated by the association between the reconstruction error and the information-theoretic measurement. Our method is based on class-wise low-dimensional embedding and the resulting reconstruction error. Given multi-class datasets such as a case-control study dataset, low-dimensional embedding is first applied to each class to obtain a compressed representation of the class, and also for the entire dataset. Reconstruction is then performed to calculate the error of each feature and the final score for each feature is defined in terms of the reconstruction errors. The correlation between the information theoretic measurement and the proposed method is demonstrated using a simulation. For performance validation, we compared the classification performance of the proposed method with those of various algorithms on benchmark datasets.ConclusionsThe proposed method showed higher accuracy and lower execution time than the other established methods. Moreover, an experiment was conducted on the TCGA breast cancer dataset, and it was confirmed that the genes with the highest scores were highly associated with subtypes of breast cancer.
机构:
SUNY Buffalo, Dept Comp Sci & Engn, Buffalo, NY 14260 USASUNY Buffalo, Dept Pharmaceut Sci, Buffalo, NY 14260 USA
Chanda, Pritam
;
Sucheston, Lara
论文数: 0引用数: 0
h-index: 0
机构:
SUNY Buffalo, Dept Biostat, Buffalo, NY 14260 USA
Roswell Pk Canc Inst, Div Canc Prevent & Control, Buffalo, NY 14263 USASUNY Buffalo, Dept Pharmaceut Sci, Buffalo, NY 14260 USA
Sucheston, Lara
;
Liu, Song
论文数: 0引用数: 0
h-index: 0
机构:
SUNY Buffalo, Dept Biostat, Buffalo, NY 14260 USA
Roswell Pk Canc Inst, Div Canc Prevent & Control, Buffalo, NY 14263 USASUNY Buffalo, Dept Pharmaceut Sci, Buffalo, NY 14260 USA
Liu, Song
;
Zhang, Aidong
论文数: 0引用数: 0
h-index: 0
机构:
SUNY Buffalo, Dept Comp Sci & Engn, Buffalo, NY 14260 USASUNY Buffalo, Dept Pharmaceut Sci, Buffalo, NY 14260 USA
Zhang, Aidong
;
Ramanathan, Murali
论文数: 0引用数: 0
h-index: 0
机构:
SUNY Buffalo, Dept Pharmaceut Sci, Buffalo, NY 14260 USASUNY Buffalo, Dept Pharmaceut Sci, Buffalo, NY 14260 USA
机构:
SUNY Buffalo, Dept Comp Sci & Engn, Buffalo, NY 14260 USASUNY Buffalo, Dept Pharmaceut Sci, Buffalo, NY 14260 USA
Chanda, Pritam
;
Sucheston, Lara
论文数: 0引用数: 0
h-index: 0
机构:
SUNY Buffalo, Dept Biostat, Buffalo, NY 14260 USA
Roswell Pk Canc Inst, Div Canc Prevent & Control, Buffalo, NY 14263 USASUNY Buffalo, Dept Pharmaceut Sci, Buffalo, NY 14260 USA
Sucheston, Lara
;
Liu, Song
论文数: 0引用数: 0
h-index: 0
机构:
SUNY Buffalo, Dept Biostat, Buffalo, NY 14260 USA
Roswell Pk Canc Inst, Div Canc Prevent & Control, Buffalo, NY 14263 USASUNY Buffalo, Dept Pharmaceut Sci, Buffalo, NY 14260 USA
Liu, Song
;
Zhang, Aidong
论文数: 0引用数: 0
h-index: 0
机构:
SUNY Buffalo, Dept Comp Sci & Engn, Buffalo, NY 14260 USASUNY Buffalo, Dept Pharmaceut Sci, Buffalo, NY 14260 USA
Zhang, Aidong
;
Ramanathan, Murali
论文数: 0引用数: 0
h-index: 0
机构:
SUNY Buffalo, Dept Pharmaceut Sci, Buffalo, NY 14260 USASUNY Buffalo, Dept Pharmaceut Sci, Buffalo, NY 14260 USA