Autoencoded DNA methylation data to predict breast cancer recurrence: Machine learning models and gene-weight significance

被引:28
作者
Macias-Garcia, Laura [2 ]
Martinez-Ballesteros, Maria [1 ]
Luna-Romera, Jose Maria [1 ]
Garcia-Heredia, Jose M. [3 ]
Garcia-Gutierrez, Jorge [1 ]
Riquelme-Santos, Jose C. [1 ]
机构
[1] Univ Seville, Sch Comp Engn, Dept Comp Languages & Syst, Seville, Spain
[2] Univ Seville, Fac Med, Dept Citol & Histol, Seville, Spain
[3] Univ Seville, Dept Plant Biochem & Mol Biol, Seville, Spain
关键词
Autoencoder; Breast cancer; DNA methylation; Feature generation; Machine learning; COMPREHENSIVE MOLECULAR PORTRAITS; INCREASED EXPRESSION; FEATURE-SELECTION; DOWN-REGULATION; CELL-MIGRATION; GENOME-WIDE; 1P; ASSOCIATION; METASTASIS; RESISTANCE;
D O I
10.1016/j.artmed.2020.101976
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Breast cancer is the most frequent cancer in women and the second most frequent overall after lung cancer. Although the 5-year survival rate of breast cancer is relatively high, recurrence is also common which often involves metastasis with its consequent threat for patients. DNA methylation-derived databases have become an interesting primary source for supervised knowledge extraction regarding breast cancer. Unfortunately, the study of DNA methylation involves the processing of hundreds of thousands of features for every patient. DNA methylation is featured by High Dimension Low Sample Size which has shown well-known issues regarding feature selection and generation. Autoencoders (AEs) appear as a specific technique for conducting nonlinear feature fusion. Our main objective in this work is to design a procedure to summarize DNA methylation by taking advantage of AEs. Our proposal is able to generate new features from the values of CpG sites of patients with and without recurrence. Then, a limited set of relevant genes to characterize breast cancer recurrence is proposed by the application of survival analysis and a pondered ranking of genes according to the distribution of their CpG sites. To test our proposal we have selected a dataset from The Cancer Genome Atlas data portal and an AE with a single-hidden layer. The literature and enrichment analysis (based on genomic context and functional annotation) conducted regarding the genes obtained with our experiment confirmed that all of these genes were related to breast cancer recurrence.
引用
收藏
页数:16
相关论文
共 100 条
[1]  
Abadi M, 2016, PROCEEDINGS OF OSDI'16: 12TH USENIX SYMPOSIUM ON OPERATING SYSTEMS DESIGN AND IMPLEMENTATION, P265
[2]   Predicting Breast Cancer Recurrence Using Machine Learning Techniques: A Systematic Review [J].
Abreu, Pedro Henriques ;
Santos, Miriam Seoane ;
Abreu, Miguel Henriques ;
Andrade, Bruno ;
Silva, Daniel Castro .
ACM COMPUTING SURVEYS, 2016, 49 (03)
[3]   Functional role of miR-10b in tamoxifen resistance of ER-positive breast cancer cells through down-regulation of HDAC4 [J].
Ahmad, Aamir ;
Ginnebaugh, Kevin R. ;
Yin, Shuping ;
Bollig-Fischer, Aliccia ;
Reddy, Kaladhar B. ;
Sarkar, Fazlul H. .
BMC CANCER, 2015, 15
[4]  
[Anonymous], 2019, APACHE SPARK LIGHTNI
[5]  
[Anonymous], 2012, ASCO ANN M P
[6]  
[Anonymous], 1996, MONOGRAPHS STAT APPL
[7]   Classification of breast cancer histology images using Convolutional Neural Networks [J].
Araujo, Teresa ;
Aresta, Guilherme ;
Castro, Eduardo ;
Rouco, Jose ;
Aguiar, Paulo ;
Eloy, Catarina ;
Polonia, Antonio ;
Campilho, Aurelio .
PLOS ONE, 2017, 12 (06)
[8]   Frequent deletion of chromosome 1p sequences in an aggressive histologic subtype of endometrial cancer [J].
Arlt, MF ;
Herzog, TJ ;
Mutch, DG ;
Gersell, DJ ;
Liu, H ;
Goodfellow, PJ .
HUMAN MOLECULAR GENETICS, 1996, 5 (07) :1017-1021
[9]  
Banelli B, 2015, METHODS MOL BIOL, V1315, P189, DOI 10.1007/978-1-4939-2715-9_14
[10]   Identification of highly penetrant Rb-related synthetic lethal interactions in triple negative breast cancer [J].
Brough, Rachel ;
Gulati, Aditi ;
Haider, Syed ;
Kumar, Rahul ;
Campbell, James ;
Knudsen, Erik ;
Pettitt, Stephen J. ;
Ryan, Coim J. ;
Lord, Christopher J. .
ONCOGENE, 2018, 37 (43) :5701-5718