Semi-supervised clustering for gene-expression data in multiobjective optimization framework

被引:25
作者
Alok, Abhay Kumar [1 ]
Saha, Sriparna [1 ]
Ekbal, Asif [1 ]
机构
[1] Indian Inst Technol, Comp Sci Engn, Patna, Bihar, India
关键词
Gene expression data clustering; Semi-supervised classification; Multiobjective optimization; Cluster validity index; AMOSA; TRANSCRIPTIONAL PROGRAM; OLIGONUCLEOTIDE ARRAYS; COEXPRESSED GENES; ALGORITHM; MICROARRAY; PATTERNS; CLASSIFICATION; INDEXES;
D O I
10.1007/s13042-015-0335-8
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Studying the patterns hidden in gene expression data helps to understand the functionality of genes. But due to the large volume of genes and the complexity of biological networks it is difficult to study the resulting mass of data which often consists of millions of measurements. In order to reveal natural structures and to identify interesting patterns from the given gene expression data set, clustering techniques are applied. Semi-supervised classification is a new direction of machine learning. It requires huge unlabeled data and a few labeled data. Semi-supervised classification in general performs better than unsupervised classification. But to the best of our knowledge there are no works for solving gene expression data clustering problem using semi-supervised classification techniques. In the current paper we have made an attempt to solve the gene expression data clustering problem using a multiobjective optimization based semi-supervised classification technique with the aim to attain good quality partitions by using few labeled data. In order to generate the labeled data, initially Fuzzy C-means clustering technique is applied. In order to automatically determine the partitioning, multiple cluster centers corresponding to a cluster are encoded in the form of a string. In order to compute the quality of the obtained partitioning, values of five objective functions are computed. The effectiveness of this proposed semi-supervised clustering technique is demonstrated on five publicly available benchmark gene expression data sets. Comparison results with the existing techniques for gene expression data clustering prove that the proposed method is the most effective one. Statistical and biological significance tests have also been carried out.
引用
收藏
页码:421 / 439
页数:19
相关论文
共 50 条
  • [21] Combining Semi-supervised Clustering and Classification Under a Generalized Framework
    Jiang, Zhen
    Zhao, Lingyun
    Lu, Yu
    JOURNAL OF CLASSIFICATION, 2025, 42 (01) : 181 - 204
  • [22] Pixel Classification of Remote Sensing Satellite Image using Semi-supervised Clustering
    Alok, Abhay Kumar
    Saha, Sriparna
    Ekbal, Asif
    2014 9TH INTERNATIONAL CONFERENCE ON INDUSTRIAL AND INFORMATION SYSTEMS (ICIIS), 2014, : 685 - 690
  • [23] Double Selection Based Semi-Supervised Clustering Ensemble for Tumor Clustering from Gene Expression Profiles
    Yu, Zhiwen
    Chen, Hongsheng
    You, Jane
    Wong, Hau-San
    Liu, Jiming
    Li, Le
    Han, Guoqiang
    IEEE-ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS, 2014, 11 (04) : 727 - 740
  • [24] Multi-objective semi-supervised clustering of tissue samples for cancer diagnosis
    Saha, Sriparna
    Kaushik, Kuldeep
    Alok, Abhay Kumar
    Acharya, Sudipta
    SOFT COMPUTING, 2016, 20 (09) : 3381 - 3392
  • [25] Simultaneous feature selection and clustering of micro-array and RNA-sequence gene expression data using multiobjective optimization
    Alok, Abhay Kumar
    Gupta, Pooja
    Saha, Sriparna
    Sharma, Vineet
    INTERNATIONAL JOURNAL OF MACHINE LEARNING AND CYBERNETICS, 2020, 11 (11) : 2541 - 2563
  • [26] Semi-supervised Naive Hubness Bayesian k-Nearest Neighbor for Gene Expression Data
    Buza, Krisztian
    PROCEEDINGS OF THE 9TH INTERNATIONAL CONFERENCE ON COMPUTER RECOGNITION SYSTEMS, CORES 2015, 2016, 403 : 101 - 110
  • [27] Classification of gene expression data: A hubness-aware semi-supervised approach
    Buza, Krisztian
    COMPUTER METHODS AND PROGRAMS IN BIOMEDICINE, 2016, 127 : 105 - 113
  • [28] A semi-supervised approach to projected clustering with applications to microarray data
    Yip, Kevin Y.
    Cheung, Lin
    Cheung, David W.
    Jing, Liping
    Ng, Michael K.
    INTERNATIONAL JOURNAL OF DATA MINING AND BIOINFORMATICS, 2009, 3 (03) : 229 - 259
  • [29] Semi-supervised sparse representation collaborative clustering of incomplete data
    Deng, Tingquan
    Wang, Jingyu
    Jia, Qingwei
    Yang, Ming
    APPLIED INTELLIGENCE, 2023, 53 (24) : 31065 - 31076
  • [30] Semi-supervised clustering of large data sets with kernel methods
    Fausser, Stefan
    Schwenker, Friedhelm
    PATTERN RECOGNITION LETTERS, 2014, 37 : 78 - 84