Identification of Breast Cancer Metastasis Markers from Gene Expression Profiles Using Machine Learning Approaches

被引:8
作者
Jung, Jinmyung [1 ]
Yoo, Sunyong [2 ]
机构
[1] Univ Suwon, Coll Informat & Commun Technol, Div Data Sci, Hwaseong 18323, South Korea
[2] Chonnam Natl Univ, Dept ICT Convergence Syst Engn, Gwangju 61005, South Korea
基金
新加坡国家研究基金会;
关键词
metastasis marker; gene expression; machine learning; XGBoost; breast cancer; feature importance; PROTEIN; REGULATOR; RESOURCE;
D O I
10.3390/genes14091820
中图分类号
Q3 [遗传学];
学科分类号
071007 ; 090102 ;
摘要
Cancer metastasis accounts for approximately 90% of cancer deaths, and elucidating markers in metastasis is the first step in its prevention. To characterize metastasis marker genes (MGs) of breast cancer, XGBoost models that classify metastasis status were trained with gene expression profiles from TCGA. Then, a metastasis score (MS) was assigned to each gene by calculating the inner product between the feature importance and the AUC performance of the models. As a result, 54, 202, and 357 genes with the highest MS were characterized as MGs by empirical p-value cutoffs of 0.001, 0.005, and 0.01, respectively. The three sets of MGs were compared with those from existing metastasis marker databases, which provided significant results in most comparisons (p-value < 0.05). They were also significantly enriched in biological processes associated with breast cancer metastasis. The three MGs, SPPL2C, KRT23, and RGS7, showed highly significant results (p-value < 0.01) in the survival analysis. The MGs that could not be identified by statistical analysis (e.g., GOLM1, ELAVL1, UBP1, and AZGP1), as well as the MGs with the highest MS (e.g., ZNF676, FAM163B, LDOC2, IRF1, and STK40), were verified via the literature. Additionally, we checked how close the MGs were to each other in the protein-protein interaction networks. We expect that the characterized markers will help understand and prevent breast cancer metastasis.
引用
收藏
页数:11
相关论文
共 50 条
[1]   Feature Subset Selection for Malware Detection in Smart IoT Platforms [J].
Abawajy, Jemal ;
Darem, Abdulbasit ;
Alhashmi, Asma A. .
SENSORS, 2021, 21 (04) :1-19
[2]   Machine learning and deep learning methods that use omics data for metastasis prediction [J].
Albaradei, Somayah ;
Thafar, Maha ;
Alsaedi, Asim ;
Van Neste, Christophe ;
Gojobori, Takashi ;
Essack, Magbubah ;
Gao, Xin .
COMPUTATIONAL AND STRUCTURAL BIOTECHNOLOGY JOURNAL, 2021, 19 :5008-5018
[3]   The Cancer Cell Line Encyclopedia enables predictive modelling of anticancer drug sensitivity [J].
Barretina, Jordi ;
Caponigro, Giordano ;
Stransky, Nicolas ;
Venkatesan, Kavitha ;
Margolin, Adam A. ;
Kim, Sungjoon ;
Wilson, Christopher J. ;
Lehar, Joseph ;
Kryukov, Gregory V. ;
Sonkin, Dmitriy ;
Reddy, Anupama ;
Liu, Manway ;
Murray, Lauren ;
Berger, Michael F. ;
Monahan, John E. ;
Morais, Paula ;
Meltzer, Jodi ;
Korejwa, Adam ;
Jane-Valbuena, Judit ;
Mapa, Felipa A. ;
Thibault, Joseph ;
Bric-Furlong, Eva ;
Raman, Pichai ;
Shipway, Aaron ;
Engels, Ingo H. ;
Cheng, Jill ;
Yu, Guoying K. ;
Yu, Jianjun ;
Aspesi, Peter, Jr. ;
de Silva, Melanie ;
Jagtap, Kalpana ;
Jones, Michael D. ;
Wang, Li ;
Hatton, Charles ;
Palescandolo, Emanuele ;
Gupta, Supriya ;
Mahan, Scott ;
Sougnez, Carrie ;
Onofrio, Robert C. ;
Liefeld, Ted ;
MacConaill, Laura ;
Winckler, Wendy ;
Reich, Michael ;
Li, Nanxin ;
Mesirov, Jill P. ;
Gabriel, Stacey B. ;
Getz, Gad ;
Ardlie, Kristin ;
Chan, Vivien ;
Myer, Vic E. .
NATURE, 2012, 483 (7391) :603-607
[4]   DeePaC: predicting pathogenic potential of novel DNA with reverse-complement neural networks [J].
Bartoszewicz, Jakub M. ;
Seidel, Anja ;
Rentzsch, Robert ;
Renard, Bernhard Y. .
BIOINFORMATICS, 2020, 36 (01) :81-89
[5]   The use of the area under the roc curve in the evaluation of machine learning algorithms [J].
Bradley, AP .
PATTERN RECOGNITION, 1997, 30 (07) :1145-1159
[6]   Gene Expression Profiles for Predicting Metastasis in Breast Cancer: A Cross-Study Comparison of Classification Methods [J].
Burton, Mark ;
Thomassen, Mads ;
Tan, Qihua ;
Kruse, Torben A. .
SCIENTIFIC WORLD JOURNAL, 2012,
[7]   Ensemble deep learning in bioinformatics [J].
Cao, Yue ;
Geddes, Thomas Andrew ;
Yang, Jean Yee Hwa ;
Yang, Pengyi .
NATURE MACHINE INTELLIGENCE, 2020, 2 (09) :500-508
[8]   MUC16 promotes triple-negative breast cancer lung metastasis by modulating RNA-binding protein ELAVL1/HUR [J].
Chaudhary, Sanjib ;
Appadurai, Muthamil Iniyan ;
Maurya, Shailendra Kumar ;
Nallasamy, Palanisamy ;
Marimuthu, Saravanakumar ;
Shah, Ashu ;
Atri, Pranita ;
Ramakanth, Chirravuri Venkata ;
Lele, Subodh M. ;
Seshacharyulu, Parthasarathy ;
Ponnusamy, Moorthy P. ;
Nasser, Mohd W. ;
Ganti, Apar Kishor ;
Batra, Surinder K. ;
Lakshmanan, Imayavaramban .
BREAST CANCER RESEARCH, 2023, 25 (01)
[9]   Screening and evaluation of the role of immune genes of brain metastasis in lung adenocarcinoma progression based on the TCGA and GEO databases [J].
Chen, Cheng ;
Guo, Qiang ;
Tang, Yang ;
Qu, Wendong ;
Zuo, Jiebin ;
Ke, Xixian ;
Song, Yongxiang .
JOURNAL OF THORACIC DISEASE, 2021, 13 (08) :5016-+
[10]   XGBoost: A Scalable Tree Boosting System [J].
Chen, Tianqi ;
Guestrin, Carlos .
KDD'16: PROCEEDINGS OF THE 22ND ACM SIGKDD INTERNATIONAL CONFERENCE ON KNOWLEDGE DISCOVERY AND DATA MINING, 2016, :785-794