A survey on single and multi omics data mining methods in cancer data classification

被引:39
作者
Momeni, Zahra [1 ]
Hassanzadeh, Esmail [1 ]
Abadeh, Mohammad Saniee [1 ,2 ]
Bellazzi, Riccardo [3 ,4 ]
机构
[1] Tarbiat Modares Univ, Fac Elect & Comp Engn, Tehran, Iran
[2] Inst Res Fundamental Sci IPM, Sch Comp Sci, Tehran, Iran
[3] Univ Pavia, Dept Elect Comp & Biomed Engn, Pavia, Italy
[4] IRCCS ICS Maugeri, Pavia, Italy
关键词
Cancer classification; Single and multi omics data; Gene selection; High dimensional datasets; Data integration; PARTICLE SWARM OPTIMIZATION; FEATURE SUBSET-SELECTION; ENSEMBLE FEATURE-SELECTION; GENE SELECTION; ALGORITHM; DISEASE; SEARCH; ROBUST; REGULARIZATION; DISCRIMINANT;
D O I
10.1016/j.jbi.2020.103466
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
Data analytics is routinely used to support biomedical research in all areas, with particular focus on the most relevant clinical conditions, such as cancer. Bioinformatics approaches, in particular, have been used to characterize the molecular aspects of diseases. In recent years, numerous studies have been performed on cancer based upon single and multi-omics data. For example, Single-omics-based studies have employed a diverse set of data, such as gene expression, DNA methylation, or miRNA, to name only a few instances. Despite that, a significant part of literature reports studies on gene expression with microarray datasets. Single-omics data have high numbers of attributes and very low sample counts. This characteristic makes them paradigmatic of an under-sampled, small-n large-p machine learning problem. An important goal of single-omics data analysis is to find the most relevant genes, in terms of their potential use in clinics and research, in the batch of available data. This problem has been addressed in gene selection as one of the pre-processing steps in data mining. An analysis that use only one type of data (single-omics) often miss the complexity of the landscape of molecular phenomena underlying the disease. As a result, they provide limited and sometimes poorly reliable information about the disease mechanisms. Therefore, in recent years, researchers have been eager to build models that are more complex, obtaining more reliable results using multi-omics data. However, to achieve this, the most important challenge is data integration. In this paper, we provide a comprehensive overview of the challenges in single and multi-omics data analysis of cancer data, focusing on gene selection and data integration methods.
引用
收藏
页数:17
相关论文
共 135 条
[1]   Bi-stage hierarchical selection of pathway genes for cancer progression using a swarm based computational approach [J].
Agarwalla, Prativa ;
Mukhopadhyay, Sumitra .
APPLIED SOFT COMPUTING, 2018, 62 :230-250
[2]   Breast cancer diagnosis using GA feature selection and Rotation Forest [J].
Alickovic, Emina ;
Subasi, Abdulhamit .
NEURAL COMPUTING & APPLICATIONS, 2017, 28 (04) :753-763
[3]   Supervised, Unsupervised, and Semi-Supervised Feature Selection: A Review on Gene Selection [J].
Ang, Jun Chin ;
Mirzal, Andri ;
Haron, Habibollah ;
Hamed, Haza Nuzly Abdull .
IEEE-ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS, 2016, 13 (05) :971-989
[4]   Two hybrid wrapper-filter feature selection algorithms applied to high-dimensional microarray experiments [J].
Apolloni, Javier ;
Leguizamon, Guillermo ;
Alba, Enrique .
APPLIED SOFT COMPUTING, 2016, 38 :922-932
[5]   Multi-Omics Factor Analysis-a framework for unsupervised integration of multi-omics data sets [J].
Argelaguet, Ricard ;
Velten, Britta ;
Arnol, Damien ;
Dietrich, Sascha ;
Zenz, Thorsten ;
Marioni, John C. ;
Buettner, Florian ;
Huber, Wolfgang ;
Stegle, Oliver .
MOLECULAR SYSTEMS BIOLOGY, 2018, 14 (06)
[6]  
Arunkumar C., 2018, Future Computing and Informatics Journal, V3, P131, DOI 10.1016/j.fcij.2018.02.002
[7]   Data Classification Using Feature Selection And kNN Machine Learning Approach [J].
Begum, Shemim ;
Chakraborty, Debasis ;
Sarkar, Ram .
2015 INTERNATIONAL CONFERENCE ON COMPUTATIONAL INTELLIGENCE AND COMMUNICATION NETWORKS (CICN), 2015, :811-814
[8]   Ensemble feature selection for high dimensional data: a new method and a comparative study [J].
Ben Brahim, Afef ;
Limam, Mohamed .
ADVANCES IN DATA ANALYSIS AND CLASSIFICATION, 2018, 12 (04) :937-952
[9]   Genetic programming for feature construction and selection in classification on high-dimensional data [J].
Binh Tran ;
Xue, Bing ;
Zhang, Mengjie .
MEMETIC COMPUTING, 2016, 8 (01) :3-15
[10]   An ensemble of filters and classifiers for microarray data classification [J].
Bolon-Canedo, V. ;
Sanchez-Marono, N. ;
Alonso-Betanzos, A. .
PATTERN RECOGNITION, 2012, 45 (01) :531-539