Iterative bicluster-based least square framework for estimation of missing values in microarray gene expression data

被引:40
作者
Cheng, K. O. [1 ]
Law, N. F. [1 ]
Siu, W. C. [1 ,2 ]
机构
[1] Hong Kong Polytech Univ, Dept Elect & Informat Engn, Ctr Signal Proc, Hong Kong, Hong Kong, Peoples R China
[2] Hong Kong Polytech Univ, Dept Elect & Informat Engn EIE, Hong Kong, Hong Kong, Peoples R China
关键词
Missing value imputation; Biclustering; Iterative estimation; Gene expression analysis; SACCHAROMYCES-CEREVISIAE; IDENTIFICATION; CLASSIFICATION;
D O I
10.1016/j.patcog.2011.10.012
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
DNA microarray experiment inevitably generates gene expression data with missing values. An important and necessary pre-processing step is thus to impute these missing values. Existing imputation methods exploit gene correlation among all experimental conditions for estimating the missing values. However, related genes coexpress in subsets of experimental conditions only. In this paper, we propose to use biclusters, which contain similar genes under subset of conditions for characterizing the gene similarity and then estimating the missing values. To further improve the accuracy in missing value estimation, an iterative framework is developed with a stopping criterion on minimizing uncertainty. Extensive experiments have been conducted on artificial datasets, real microarray datasets as well as one non-microarray dataset. Our proposed biclusters-based approach is able to reduce errors in missing value estimation. (C) 2011 Elsevier Ltd. All rights reserved.
引用
收藏
页码:1281 / 1289
页数:9
相关论文
共 46 条
  • [41] A Filter Based Feature Selection Algorithm Using Null Space of Covariance Matrix for DNA Microarray Gene Expression Data
    Sharma, Alok
    Imoto, Seiya
    Miyano, Satoru
    CURRENT BIOINFORMATICS, 2012, 7 (03) : 289 - 294
  • [42] Interval based fuzzy systems for identification of important genes from microarray gene expression data: Application to carcinogenic development
    De, Rajat K.
    Ghosh, Anupam
    JOURNAL OF BIOMEDICAL INFORMATICS, 2009, 42 (06) : 1022 - 1028
  • [43] Machine Learning Framework for the Prediction of Alzheimer's Disease Using Gene Expression Data Based on Efficient Gene Selection
    El-Gawady, Aliaa
    Makhlouf, Mohamed A.
    Tawfik, BenBella S.
    Nassar, Hamed
    SYMMETRY-BASEL, 2022, 14 (03):
  • [44] CARSVM: A class association rule-based classification framework and its application to gene expression data
    Kianmehr, Keivan
    Alhajj, Reda
    ARTIFICIAL INTELLIGENCE IN MEDICINE, 2008, 44 (01) : 7 - 25
  • [45] A deep learning framework for identifying essential proteins based on protein-protein interaction network and gene expression data
    Zeng, Min
    Li, Min
    Fei, Zhihui
    Wu, Fang-Xiang
    Li, Yaohang
    Pan, Yi
    PROCEEDINGS 2018 IEEE INTERNATIONAL CONFERENCE ON BIOINFORMATICS AND BIOMEDICINE (BIBM), 2018, : 583 - 588
  • [46] PlantExpress: A Database Integrating OryzaExpress and ArthaExpress for Single-species and Cross-species Gene Expression Network Analyses with Microarray-Based Transcriptome Data
    Kudo, Toru
    Terashima, Shin
    Takaki, Yuno
    Tomita, Ken
    Saito, Misa
    Kanno, Maasa
    Yokoyama, Koji
    Yano, Kentaro
    PLANT AND CELL PHYSIOLOGY, 2017, 58 (01) : e1