Improving the accuracy of multiclass classification in machine learning: A case study in a cell signaling dataset

被引:3
作者
Pablo Gonzalez-Perez, Pedro [1 ]
Eduardo Sanchez-Gutierrez, Maximo [2 ]
机构
[1] Univ Autonoma Metropolitana Cuajimalpa, Dept Matemat Aplicadas & Sistemas, Ciudad De Mexico, Mexico
[2] Univ Autonoma Ciudad Mexico, Colegio Ciencia & Tecnol, Ciudad De Mexico, Mexico
关键词
Multiclass classification; machine learning; exploratory data analysis; dimensionality reduction; cellular signaling data; FEATURE-SELECTION; DIAGNOSIS;
D O I
10.3233/IDA-215826
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
It is important to make sense of the data within its context to propose a useful model to solve a problem. This domain knowledge includes information not contained in the data, but that will help us understand the data to be fed into a machine-learning algorithm and guide us on what features might help our model. Nevertheless, domain knowledge may become insufficient as the input variables increase, forcing the need to try automated feature selection techniques. In this study, we investigate whether the joint use of 1) feature selection techniques, such as Chi-square, Tree-based Feature Selection, Pearson's Correlation, LASSO, Low Variance, and Recursive Feature Elimination, 2) outlier detection methods such as Isolation-Forest, and 3) Cross-Validation techniques lead to improving the accuracy in multiclass classification in machine learning. Specifically, we address the classification of patterns representing the activation state of cell signaling components into classes that symbolize the different cellular processes triggered in cancer cells. The results presented in this work have shown an accuracy increase with up to 80% fewer input features by only using 3 out of the 16 original descriptors.
引用
收藏
页码:481 / 500
页数:20
相关论文
共 29 条
  • [1] Aggarwal Vaibhav, 2019, 2019 3rd International Conference on Trends in Electronics and Informatics (ICOEI). Proceedings, P788, DOI 10.1109/ICOEI.2019.8862582
  • [2] AOCT-NET: a convolutional network automated classification of multiclass retinal diseases using spectral-domain optical coherence tomography images
    Alqudah, Ali Mohammad
    [J]. MEDICAL & BIOLOGICAL ENGINEERING & COMPUTING, 2020, 58 (01) : 41 - 53
  • [3] Benchmark for filter methods for feature selection in high-dimensional classification data
    Bommert, Andrea
    Sun, Xudong
    Bischl, Bernd
    Rahnenfuehrer, Joerg
    Lang, Michel
    [J]. COMPUTATIONAL STATISTICS & DATA ANALYSIS, 2020, 143
  • [4] Feature selection in machine learning: A new perspective
    Cai, Jie
    Luo, Jiawei
    Wang, Shulin
    Yang, Sheng
    [J]. NEUROCOMPUTING, 2018, 300 : 70 - 79
  • [5] A survey on feature selection methods
    Chandrashekar, Girish
    Sahin, Ferat
    [J]. COMPUTERS & ELECTRICAL ENGINEERING, 2014, 40 (01) : 16 - 28
  • [6] Cheng Z., 2019, P C RES ADAPTIVE CON, P161, DOI 10.1145/3338840.3355641
  • [7] An evaluation of feature selection methods for environmental data
    Effrosynidis, Dimitrios
    Arampatzis, Avi
    [J]. ECOLOGICAL INFORMATICS, 2021, 61
  • [8] A biochemically inspired coordination-based model for simulating intracellular signalling pathways
    Gonzalez Perez, P. P.
    Omicini, A.
    Sbaraglia, M.
    [J]. JOURNAL OF SIMULATION, 2013, 7 (03) : 216 - 226
  • [9] Inspecting the Role of PI3K/AKT Signaling Pathway in Cancer Development Using an In Silico Modeling and Simulation Approach
    Gonzalez-Perez, Pedro Pablo
    Cardenas-Garcia, Maura
    [J]. BIOINFORMATICS AND BIOMEDICAL ENGINEERING, IWBBIO 2018, PT I, 2018, 10813 : 83 - 95
  • [10] Gonzalez-Perez PP., 2013, J Comput Model, V3, P35