A supervised machine learning-based methodology for analyzing dysregulation in splicing machinery: An application in cancer diagnosis

被引:8
作者
Reyes, Oscar [1 ,2 ]
Perez, Eduardo [1 ,2 ]
Luque, Raul M. [2 ,3 ]
Castano, Justo [2 ,3 ]
Ventura, Sebastian [1 ,2 ]
机构
[1] Univ Cordoba, Dept Comp Sci & Numer Anal, Cordoba, Spain
[2] Maimonides Biomed Res Inst Cordoba, Cordoba, Spain
[3] Univ Cordoba, Dept Cell Biol Physiol & Immunol, Cordoba, Spain
关键词
Transcript-based analysis; Alternative Splicing; Feature weighting methods; Classification methods; Explaining classifier's predictions; FEATURE-SELECTION; EXPRESSION; CLASSIFICATION; MUTATIONS; SUBTYPES; IMPROVE; FAMILY;
D O I
10.1016/j.artmed.2020.101950
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Deregulated splicing machinery components have shown to be associated with the development of several types of cancer and, therefore, the determination of such alterations can help the development of tumor-specific molecular targets for early prognosis and therapy. Determining such splicing components, however, is not a straightforward task mainly due to the heterogeneity of tumors, the variability across samples, and the fat-short characteristic of genomic datasets. In this work, a supervised machine learning-based methodology is proposed, allowing the determination of subsets of relevant splicing components that best discriminate samples. The methodology comprises three main phases: first, a ranking of features is determined by means of applying feature weighting algorithms that compute the importance of each splicing component; second, the best subset of features that allows the induction of an accurate classifier is determined by means of conducting an effective heuristic search; then the confidence over the induced classifier is assessed by means of explaining the individual predictions and its global behavior. At the end, an extensive experimental study was conducted on a large collection of transcript-based datasets, illustrating the utility and benefit of the proposed methodology for analyzing dysregulation in splicing machinery.
引用
收藏
页数:13
相关论文
共 94 条
[1]   Aggregating Inconsistent Information: Ranking and Clustering [J].
Ailon, Nir ;
Charikar, Moses ;
Newman, Alantha .
JOURNAL OF THE ACM, 2008, 55 (05)
[2]  
[Anonymous], 2006, Journal of the Royal Statistical Society, Series B
[3]  
[Anonymous], 1986, STAT DATA ANAL GEOLO
[4]  
Apley DW, 2016, ARXIV161208468, P44
[5]  
Bao Y, 2019, CELL MOL BIOL LETT, V24, P20
[6]   SYK Allelic Loss and the Role of Syk-Regulated Genes in Breast Cancer Survival [J].
Blancato, Jan ;
Graves, Ashley ;
Rashidi, Banafsheh ;
Moroni, Maria ;
Tchobe, Leopold ;
Ozdemirli, Metin ;
Kallakury, Bhaskar ;
Makambi, Kepher H. ;
Marian, Catalin ;
Mueller, Susette C. .
PLOS ONE, 2014, 9 (02)
[7]   Classification of lung cancer using ensemble-based feature selection and machine learning methods [J].
Cai, Zhihua ;
Xu, Dong ;
Zhang, Qing ;
Zhang, Jiexia ;
Ngai, Sai-Ming ;
Shao, Jianlin .
MOLECULAR BIOSYSTEMS, 2015, 11 (03) :791-800
[8]   Next-Generation Machine Learning for Biological Networks [J].
Camacho, Diogo M. ;
Collins, Katherine M. ;
Powers, Rani K. ;
Costello, James C. ;
Collins, James J. .
CELL, 2018, 173 (07) :1581-1592
[9]   The Functional Impact of Alternative Splicing in Cancer [J].
Climente-Gonzalez, Hector ;
Porta-Pardo, Eduard ;
Godzik, Adam ;
Eyras, Eduardo .
CELL REPORTS, 2017, 20 (09) :2215-2226
[10]  
Danaee P, 2017, BIOCOMPUT-PAC SYM, P219