Classification of breast cancer subtypes: A study based on representative genes

被引:0
作者
Mendonca-Neto R. [1 ]
Reis J. [1 ]
Okimoto L. [1 ]
Fenyö D. [2 ]
Silva C. [2 ]
Nakamura F. [1 ]
Nakamura E. [1 ]
机构
[1] Federal University of Amazonas, Brazil
[2] New York University, United States
关键词
Breast Cancer; Gene Expression; Subtypes Classification;
D O I
10.5753/jbcs.2022.2209
中图分类号
学科分类号
摘要
Breast cancer is the second most common cancer type and is the leading cause of cancer-related deaths worldwide. Since it is a heterogeneous disease, subtyping breast cancer plays an important role in performing a specific treatment. In this work, we propose an evaluation framework that uses different machine learning techniques for a broader analysis of the PAM50 list in the classification of breast cancer subtypes. The experiments show that the best method to be used in the classification of breast cancer subtypes is the SVM with linear kernel, which presented an F1 score of 0.98 for the Basal subtype and 0.90 for the Her 2 subtype, the two subtypes with worse prognosis, respectively. We also presented a gene analysis for the classification methods using SHAP values, where we found which genes are important for the classification of each subtype. © 2022, Brazilian Computing Society. All rights reserved.
引用
收藏
页码:59 / 68
页数:9
相关论文
共 36 条
  • [1] Alanni R., Hou J., Azzawi H., Xiang Y., Deep gene selection method to select genes from microarray datasets for cancer classification, BMC bioinformatics, 20, 608, pp. 1-15, (2019)
  • [2] Badve S., Turbin D., Thorat M. A., Morimiya A., Nielsen T. O., Perou C. M., Dunn S., Huntsman D. G., Nakshatri H., Foxa1 expression in breast cancer— correlation with luminal subtype a and survival, Clinical cancer research, 13, 15, pp. 4415-4421, (2007)
  • [3] Baldi P., Brunak S., Chauvin Y., Andersen C. A., Nielsen H., Assessing the accuracy of prediction algorithms for classification: an overview, Bioinformatics, 16, 5, pp. 412-424, (2000)
  • [4] Bergstra J., Bengio Y., Random search for hyper-parameter optimization, The Journal of Machine Learning Research, 13, 1, pp. 281-305, (2012)
  • [5] Bi Y., Xiang D., Ge Z., Li F., Jia C., Song J., An interpretable prediction model for identify-ing n7-methylguanosine sites based on xgboost and shap, Molecular Therapy-Nucleic Acids, 22, pp. 362-372, (2020)
  • [6] Bray F., Ferlay J., Soerjomataram I., . Siegel R., Torre L., Jemal A., Global cancer statistics 2018, CA: A Cancer Journal for Clinicians, 68, pp. 394-424, (2018)
  • [7] Chen X., Hu H., He L., Yu X., Liu X., Zhong R., Shu M., A novel subtype classification and risk of breast cancer by histone modification profiling, Breast cancer research and treatment, 157, 2, pp. 267-279, (2016)
  • [8] Chia S. K., Bramwell V. H., Tu D., Et al., A 50-gene intrinsic subtype classifier for prognosis and prediction of benefit from adjuvant tamoxifen, Clinical cancer research, 18, 16, pp. 4465-4472, (2012)
  • [9] Chicco D., Ten quick tips for machine learning in computational biology, BioData mining, 10, 1, pp. 1-17, (2017)
  • [10] Dai X., Li T., Bai Z., Yang Y., Liu X., Zhan J., Shi B., Breast cancer intrinsic subtype classification, clinical use and future trends, American journal of cancer research, 5, 10, (2015)