A novel minorization-maximization framework for simultaneous feature selection and clustering of high-dimensional count data

被引：1

作者：

Zamzami, Nuha ^{[1
]}

Bouguila, Nizar ^{[2
]}

机构：

[1] Univ Jeddah, Coll Comp Sci & Engn, Dept Comp Sci & Artificial Intelligence, Jeddah, Saudi Arabia

[2] Concordia Univ, Concordia Inst Informat Syst Engn CIISE, Montreal, PQ, Canada

来源：

PATTERN ANALYSIS AND APPLICATIONS | 2023年 / 26卷 / 01期

关键词：

Feature saliency; Feature selection; Model selection; Unsupervised learning; Count data; Mixture models; Generalized Dirichlet multinomial; Maximum likelihood; Minorization-maximization; UNSUPERVISED FEATURE-SELECTION; DISCRIMINANT-ANALYSIS; MAXIMUM-LIKELIHOOD; MODEL SELECTION; ALGORITHM; CLASSIFICATION; MIXTURES;

D O I：

10.1007/s10044-022-01094-z

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Count data are commonly exploited in machine learning and computer vision applications; however, they often suffer from the well-known curse of dimensionality, which declines the performance of clustering algorithms dramatically. Feature selection is a major technique for handling a large number of features, which most are often redundant and noisy. In this paper, we propose a probabilistic approach for count data based on the concept of feature saliency in the context of mixture-based clustering using the generalized Dirichlet multinomial distribution. The saliency of irrelevant features is reduced toward zero by minimizing the message length, which equates to doing feature and model selection simultaneously. It is proved that the developed approach is effective in identifying both the optimal number of clusters and the most important features, and so enhancing clustering performance significantly, using a range of challenging applications including text and image clustering.

引用

页码：91 / 106

页数：16

共 50 条

[1] A novel minorization–maximization framework for simultaneous feature selection and clustering of high-dimensional count data
Nuha Zamzami
Nizar Bouguila
Pattern Analysis and Applications, 2023, 26 : 91 - 106
[2] Mixture-based clustering for count data using approximated Fisher Scoring and Minorization-Maximization approaches
Bregu, Ornela
Zamzami, Nuha
Bouguila, Nizar
COMPUTATIONAL INTELLIGENCE, 2021, 37 (01) : 596 - 620
[3] Simultaneous Feature and Model Selection for High-Dimensional Data
Perolini, Alessandro
Guerif, Sebastien
2011 23RD IEEE INTERNATIONAL CONFERENCE ON TOOLS WITH ARTIFICIAL INTELLIGENCE (ICTAI 2011), 2011, : 47 - 50
[4] A finite mixture model for simultaneous high-dimensional clustering, localized feature selection and outlier rejection
Bouguila, Nizar
Almakadmeh, Khaled
Boutemedjet, Sabri
EXPERT SYSTEMS WITH APPLICATIONS, 2012, 39 (07) : 6641 - 6656
[5] Unsupervised Hybrid Feature Extraction Selection for High-Dimensional Non-Gaussian Data Clustering with Variational Inference
Fan, Wentao
Bouguila, Nizar
Ziou, Djemel
IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2013, 25 (07) : 1670 - 1685
[6] Clustering high-dimensional data via feature selection
Liu, Tianqi
Lu, Yu
Zhu, Biqing
Zhao, Hongyu
BIOMETRICS, 2023, 79 (02) : 940 - 950
[7] On online high-dimensional spherical data clustering and feature selection
Amayri, Ola
Bouguila, Nizar
ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE, 2013, 26 (04) : 1386 - 1398
[8] Simultaneous high-dimensional clustering and feature selection using asymmetric Gaussian mixture models
Elguebaly, Tarek
Bouguila, Nizar
IMAGE AND VISION COMPUTING, 2015, 34 : 27 - 41
[9] Model selection and application to high-dimensional count data clustering: via finite EDCM mixture models
Zamzami, Nuha
Bouguila, Nizar
APPLIED INTELLIGENCE, 2019, 49 (04) : 1467 - 1488
[10] Feature selection for high-dimensional temporal data
Tsagris, Michail
Lagani, Vincenzo
Tsamardinos, Ioannis
BMC BIOINFORMATICS, 2018, 19

← 1 2 3 4 5 →