Comparison Analysis of Gene Expression Profiles Proximity Metrics

被引:6
作者
Babichev, Sergii [1 ,4 ]
Yasinska-Damri, Lyudmyla [2 ]
Liakh, Igor [3 ]
Durnyak, Bohdan [2 ]
机构
[1] Kherson State Univ, Dept Phys, UA-73000 Kherson, Ukraine
[2] Ukrainian Acad Printing, Dept Comp Sci & Informat Technol, UA-79000 Lvov, Ukraine
[3] Uzhgorod Natl Univ, Dept Informat Phis & Math Disciplines, UA-88000 Uzhgorod, Ukraine
[4] Kherson State Univ, Dept Phys, Univ St 27, UA-73000 Kherson, Ukraine
来源
SYMMETRY-BASEL | 2021年 / 13卷 / 10期
关键词
symmetry of molecular elements interactions; gene expression profiles; mutual information maximization criterion; correlation distance; Pearson's chi(2) test; Harrington desirability index; classification accuracy; hybrid proximity metric; PARTICLE SWARM OPTIMIZATION; FEATURE-SELECTION ALGORITHM; CLASSIFICATION; RECOVERY;
D O I
10.3390/sym13101812
中图分类号
O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];
学科分类号
07 ; 0710 ; 09 ;
摘要
The problems of gene regulatory network (GRN) reconstruction and the creation of disease diagnostic effective systems based on genes expression data are some of the current directions of modern bioinformatics. In this manuscript, we present the results of the research focused on the evaluation of the effectiveness of the most used metrics to estimate the gene expression profiles' proximity, which can be used to extract the groups of informative gene expression profiles while taking into account the states of the investigated samples. Symmetry is very important in the field of both genes' and/or proteins' interaction since it undergirds essentially all interactions between molecular components in the GRN and extraction of gene expression profiles, which allows us to identify how the investigated biological objects (disease, state of patients, etc.) contribute to the further reconstruction of GRN in terms of both the symmetry and understanding the mechanism of molecular element interaction in a biological organism. Within the framework of our research, we have investigated the following metrics: Mutual information maximization (MIM) using various methods of Shannon entropy calculation, Pearson's chi 2 test and correlation distance. The accuracy of the investigated samples classification was used as the main quality criterion to evaluate the appropriate metric effectiveness. The random forest classifier (RF) was used during the simulation process. The research results have shown that results of the use of various methods of Shannon entropy within the framework of the MIM metric disagree with each other. As a result, we have proposed the modified mutual information maximization (MMIM) proximity metric based on the joint use of various methods of Shannon entropy calculation and the Harrington desirability function. The results of the simulation have also shown that the correlation proximity metric is less effective in comparison to both the MMIM metric and Pearson's chi 2 test. Finally, we propose the hybrid proximity metric (HPM) that considers both the MMIM metric and Pearson's chi 2 test. The proposed metric was investigated within the framework of one-cluster structure effectiveness evaluation. To our mind, the main benefit of the proposed HPM is in increasing the objectivity of mutually similar gene expression profiles extraction due to the joint use of the various effective proximity metrics that can contradict with each other when they are used alone.</p>
引用
收藏
页数:16
相关论文
共 35 条
[1]   A Survey on Hybrid Feature Selection Methods in Microarray Gene Expression Data for Cancer Classification [J].
Almugren, Nada ;
Alshamlan, Hala .
IEEE ACCESS, 2019, 7 :78533-78548
[2]   mRMR-ABC: A Hybrid Gene Selection Algorithm for Cancer Classification Using Microarray Gene Expression Profiling [J].
Alshamlan, Hala ;
Badr, Ghada ;
Alohali, Yousef .
BIOMED RESEARCH INTERNATIONAL, 2015, 2015
[3]   Genetic Bee Colony (GBC) algorithm: A new gene selection method for microarray cancer classification [J].
Alshamlan, Hala M. ;
Badr, Ghada H. ;
Alohali, Yousef A. .
COMPUTATIONAL BIOLOGY AND CHEMISTRY, 2015, 56 :49-60
[4]  
Archer E, 2014, J MACH LEARN RES, V15, P2833
[5]  
Babichev S., 2021, CEUR WORKSHOP PROC, V2853, P62
[6]  
Babichev S., 2019, P 2019 C INT FUZZ SY, DOI [10.2991/eusflat-19.2019.20, DOI 10.2991/EUSFLAT-19.2019.20]
[7]   Technique of Gene Expression Profiles Selection Based on SOTA Clustering Algorithm Using Statistical Criteria and Shannon Entropy [J].
Babichev, Sergii ;
Khamula, Orest ;
Durnyak, Bohdan ;
Skvor, Jiri .
LECTURE NOTES IN COMPUTATIONAL INTELLIGENCE AND DECISION MAKING (ISDMCI 2020), 2020, 1246 :23-38
[8]   Technique of Gene Expression Profiles Extraction Based on the Complex Use of Clustering and Classification Methods [J].
Babichev, Sergii ;
Skvor, Jiri .
DIAGNOSTICS, 2020, 10 (08)
[9]  
Breiman L., 2001, IEEE Trans. Broadcast., V45, P5
[10]   A hybrid feature selection method for DNA microarray data [J].
Chuang, Li-Yeh ;
Yang, Cheng-Huei ;
Wu, Kuo-Chuan ;
Yang, Cheng-Hong .
COMPUTERS IN BIOLOGY AND MEDICINE, 2011, 41 (04) :228-237